Bootstrap

关于GB2312,GBK,GB18030,Unicode,utf-8,utf-16等的字符集和编码问题一次全说清

关于字符、字符集和字符编码

字符 (Character)是各种文字和符号的总称。包括各国家文字、标点符号、图形符号、数字等,所以不仅只有我们常说的文字,而是抽象的字符。

字符集(Character Set)是一个系统支持的所有抽象字符的集合,字符集种类较多,每个字符集的字符个数也不同。

通过以下代码,可以输出所有支持的规范字符集名称

import java.nio.charset.Charset;
import java.util.Iterator;
import java.util.Map;

public class CharacterSetList {
   

    public static void main(String[] args)
    {
   
        // Creates a map of charsets
        Map<String, Charset> charsets
                = Charset.availableCharsets();

        // Iterator to store the type
        Iterator<Charset> iterator
                = charsets.values().iterator();
        int i=0;
        // Iterate till we get all the charsets
        while (iterator.hasNext()) {
   

            // Get the next
            Charset all = (Charset)iterator.next();

            // Displays the name
            //输出20列,左对齐(-号表示左对齐)
            System.out.print(String.format("%-20s", all.displayName()));
            i++;
            if(i%5==0)
                System.out.println();
        }
    }
}

输出结果如下

Big5                Big5-HKSCS          CESU-8              EUC-JP              EUC-KR
GB18030             GB2312              GBK                 IBM-Thai            IBM00858
IBM01140            IBM01141            IBM01142            IBM01143            IBM01144
IBM01145            IBM01146            IBM01147            IBM01148            IBM01149
IBM037              IBM1026             IBM1047             IBM273              IBM277
IBM278              IBM280              IBM284              IBM285              IBM290
IBM297              IBM420              IBM424              IBM437              IBM500
IBM775              IBM850              IBM852              IBM855              IBM857
IBM860              IBM861              IBM862              IBM863              IBM864
IBM865              IBM866              IBM868              IBM869              IBM870
IBM871              IBM918              ISO-2022-CN         ISO-2022-JP         ISO-2022-JP-2
ISO-2022-KR         ISO-8859-1          ISO-8859-13         ISO-8859-15         ISO-8859-16
ISO-8859-2          ISO-8859-3          ISO-8859-4          ISO-8859-5          ISO-8859-6
ISO-8859-7          ISO-8859-8          ISO-8859-9          JIS_X0201           JIS_X0212-1990
KOI8-R              KOI8-U              Shift_JIS           TIS-620             US-ASCII
UTF-16              UTF-16BE            UTF-16LE            UTF-32              UTF-32BE
UTF-32LE            UTF-8               windows-1250        windows-1251        windows-1252
windows-1253        windows-1254        windows-1255        windows-1256        windows-1257
windows-1258        windows-31j         x-Big5-HKSCS-2001   x-Big5-Solaris      x-euc-jp-linux
x-EUC-TW            x-eucJP-Open        x-IBM1006           x-IBM1025           x-IBM1046
x-IBM1097           x-IBM1098           x-IBM1112           x-IBM1122           x-IBM1123
x-IBM1124           x-IBM1129           x-IBM1166           x-IBM1364           x-IBM1381
x-IBM1383           x-IBM29626C         x-IBM300            x-IBM33722          x-IBM737
x-IBM833            x-IBM834            x-IBM856            x-IBM874            x-IBM875
x-IBM921            x-IBM922            x-IBM930            x-IBM933            x-IBM935
x-IBM937            x-IBM939            x-IBM942            x-IBM942C           x-IBM943
x-IBM943C           x-IBM948            x-IBM949            x-IBM949C           x-IBM950
x-IBM964            x-IBM970            x-ISCII91           x-ISO-2022-CN-CNS   x-ISO-2022-CN-GB
x-iso-8859-11       x-JIS0208           x-JISAutoDetect     x-Johab             x-MacArabic
x-MacCentralEurope  x-MacCroatian       x-MacCyrillic       x-MacDingbat        x-MacGreek
x-MacHebrew         x-MacIceland        x-MacRoman          x-MacRomania        x-MacSymbol
x-MacThai           x-MacTurkish        x-MacUkraine        x-MS932_0213        x-MS950-HKSCS
x-MS950-HKSCS-XP    x-mswin-936         x-PCK 
;