文章目录
关于字符、字符集和字符编码
字符 (Character)是各种文字和符号的总称。包括各国家文字、标点符号、图形符号、数字等,所以不仅只有我们常说的文字,而是抽象的字符。
字符集(Character Set)是一个系统支持的所有抽象字符的集合,字符集种类较多,每个字符集的字符个数也不同。
通过以下代码,可以输出所有支持的规范字符集名称
import java.nio.charset.Charset;
import java.util.Iterator;
import java.util.Map;
public class CharacterSetList {
public static void main(String[] args)
{
// Creates a map of charsets
Map<String, Charset> charsets
= Charset.availableCharsets();
// Iterator to store the type
Iterator<Charset> iterator
= charsets.values().iterator();
int i=0;
// Iterate till we get all the charsets
while (iterator.hasNext()) {
// Get the next
Charset all = (Charset)iterator.next();
// Displays the name
//输出20列,左对齐(-号表示左对齐)
System.out.print(String.format("%-20s", all.displayName()));
i++;
if(i%5==0)
System.out.println();
}
}
}
输出结果如下
Big5 Big5-HKSCS CESU-8 EUC-JP EUC-KR
GB18030 GB2312 GBK IBM-Thai IBM00858
IBM01140 IBM01141 IBM01142 IBM01143 IBM01144
IBM01145 IBM01146 IBM01147 IBM01148 IBM01149
IBM037 IBM1026 IBM1047 IBM273 IBM277
IBM278 IBM280 IBM284 IBM285 IBM290
IBM297 IBM420 IBM424 IBM437 IBM500
IBM775 IBM850 IBM852 IBM855 IBM857
IBM860 IBM861 IBM862 IBM863 IBM864
IBM865 IBM866 IBM868 IBM869 IBM870
IBM871 IBM918 ISO-2022-CN ISO-2022-JP ISO-2022-JP-2
ISO-2022-KR ISO-8859-1 ISO-8859-13 ISO-8859-15 ISO-8859-16
ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6
ISO-8859-7 ISO-8859-8 ISO-8859-9 JIS_X0201 JIS_X0212-1990
KOI8-R KOI8-U Shift_JIS TIS-620 US-ASCII
UTF-16 UTF-16BE UTF-16LE UTF-32 UTF-32BE
UTF-32LE UTF-8 windows-1250 windows-1251 windows-1252
windows-1253 windows-1254 windows-1255 windows-1256 windows-1257
windows-1258 windows-31j x-Big5-HKSCS-2001 x-Big5-Solaris x-euc-jp-linux
x-EUC-TW x-eucJP-Open x-IBM1006 x-IBM1025 x-IBM1046
x-IBM1097 x-IBM1098 x-IBM1112 x-IBM1122 x-IBM1123
x-IBM1124 x-IBM1129 x-IBM1166 x-IBM1364 x-IBM1381
x-IBM1383 x-IBM29626C x-IBM300 x-IBM33722 x-IBM737
x-IBM833 x-IBM834 x-IBM856 x-IBM874 x-IBM875
x-IBM921 x-IBM922 x-IBM930 x-IBM933 x-IBM935
x-IBM937 x-IBM939 x-IBM942 x-IBM942C x-IBM943
x-IBM943C x-IBM948 x-IBM949 x-IBM949C x-IBM950
x-IBM964 x-IBM970 x-ISCII91 x-ISO-2022-CN-CNS x-ISO-2022-CN-GB
x-iso-8859-11 x-JIS0208 x-JISAutoDetect x-Johab x-MacArabic
x-MacCentralEurope x-MacCroatian x-MacCyrillic x-MacDingbat x-MacGreek
x-MacHebrew x-MacIceland x-MacRoman x-MacRomania x-MacSymbol
x-MacThai x-MacTurkish x-MacUkraine x-MS932_0213 x-MS950-HKSCS
x-MS950-HKSCS-XP x-mswin-936 x-PCK