CJKV character æŹĄ in traditional and simplified Chinese, Korean, Vietnamese and Japanese forms

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 16.0, Unicode defines a total of 97,680 characters.1

The term ideographs is a misnomer, as the Chinese script is not ideographic but rather logographic.

Until the early 20th century, Vietnam also used Chinese characters (Chữ NÎm), so sometimes the abbreviation CJKV is used.

Sources

The Ideographic Research Group (IRG) is responsible for developing extensions to the encoded repertoires of CJK unified ideographs. IRG processes proposals for new CJK unified ideographs submitted by its member bodies, and after undergoing several rounds of expert review, IRG submits a consolidated set of characters to ISO/IEC JTC 1/SC 2 Working Group 2 (WG2) and the Unicode Technical Committee (UTC) for consideration for inclusion in the ISO/IEC 10646 and Unicode standards. The following IRG member bodies have been involved in the standardization of CJK unified ideographs:

The ideographs submitted by the UTC and the United Kingdom are not specific to any particular region, but are characters which have been suggested for encoding by individual experts. The ideographs submitted by SAT are required for the SAT Daizƍkyƍ text database.

The table below gives the numbers of encoded CJK unified ideographs for each IRG source for Unicode 16.0.4 The total number of characters (260,840) far exceeds the number of encoded CJK unified ideographs (97,680) as many characters have more than one source.

Country or regionCharacter count
China66,564
Hong Kong17,654
Macau344
Taiwan (TCA)58,601
Japan52,560
South Korea20,874
North Korea23,975
Vietnam13,284
United Kingdom2,503
SAT3,455
UTC1,026
Total260,840

UTC sources

The majority of characters submitted by the UTC to the IRG are derived from Unicode Technical Committee (UTC) documents.5 Other sources include:

Ordering

The ordering of CJK Unified Ideographs within Unicode blocks (not counting those added to the block later) was initially determined by consulting the following four dictionaries. Primarily, they were arranged in Kangxi Dictionary order, with the other dictionaries consulted, in order, for characters not found in the Kangxi Dictionary, to determine which Kangxi Dictionary character they should follow in the ordering.6

  1. Kangxi Dictionary
  2. Dai Kan-Wa Jiten
  3. Hanyu Da Zidian
  4. Dae Jaweon

This system is not used for more recently-added Unicode blocks. The Ideographic Research Group no longer uses the Dae Jaweon,7 nor the Dai Kan-Wa Jiten,8 in its work. The Kangxi Dictionary and Hanyu Da Zidian are still used 7 both in existing character source references,9 and as potential replacements for existing source references discovered to be erroneous.10 Similarly, although a (real or virtual) Kangxi Dictionary index was previously provided as part of the submission data for UTC-source characters, this is no longer the case.11 Instead, the stroke type of the first residual stroke (first stroke which does not form part of the radical) is supplied with all submitted characters, and used to order characters with the same radical and stroke count within the new Unicode block.12

The basic block named CJK Unified Ideographs (4E00–9FFF) contains 20,992 basic Chinese characters in the range U+4E00 through U+9FFF. The block not only includes characters used in the Chinese writing system but also kanji used in the Japanese writing system, hanja in Korea, and chữ NĂŽm characters in Vietnamese. Many characters in this block are used in all three writing systems, while others are in only one or two of the three.

This block is also known as the Unified Repertoire and Ordering (URO), especially when it needs to be differentiated from the other CJK Unified Ideographs blocks.13

The first 20,902 characters in the block are arranged according to the Kangxi Dictionary ordering of radicals. In this system the characters written with the fewest strokes are listed first. The remaining characters were added later, and so are not in radical order.

The block is the result of Han unification,14 which was somewhat controversial within East Asia.15 Since single characters used in more than one of Chinese, Japanese and Korean were coded in the same location, and the modern typographical conventions and handwriting curricula differ slightly between regions (not necessarily along language boundaries—for example, Hong Kong and Taiwan, which both use Traditional Chinese, have slightly different local conventions),16 the appearance of a selected glyph could depend on the particular font being used. However, the URO applies the source separation rule, meaning that pairs of characters treated as distinct in a character set used as a source for the URO (e.g. JIS X 0208 as used in e.g. Shift JIS) would remain pairs of separate characters in the new Unicode encoding.17

Using variation selectors, it is possible to specify certain variant CJK ideograms within Unicode.18 The Adobe-Japan1 character set, which has 14,684 ideographic variation sequences,19 is an extreme example of the use of variation selectors.20

Charts

4E00-62FF,6300-77FF,7800-8CFF,8D00-9FFF.

Sources

Note: Most characters appear in multiple sources, so the sum of individual character counts (108,480) is far greater than the number of encoded characters (20,992).21

Country or regionCodeSource 23Character countTotal
ChinaG0GB 2312-806,76320,933
G1GB 12345-90 (Traditional Chinese analogue to GB 2312-80)2,202
G3GB 13131 (unpublished Traditional Chinese analogue to GB 7589-87)4,834
G5GB 13132 (unpublished Traditional Chinese analogue to GB 7590-87)2,841
G7Modern Chinese general character chart (Simplified Chinese: çŽ°ä»Łæ±‰èŻ­é€šç”šć­—èĄš)42
G8GB 8565-88203
GCENational Academy for Educational Research4
GDMPlace name characters from the Public Order Administration, Ministry of Public Security of the People's Republic of China2
GEGB 16500-953,770
GFCModern Chinese Standard Dictionary (çŽ°ä»Łæ±‰èŻ­è§„èŒƒèŻć…žçŹŹäșŒç‰ˆ)2
GGFZTongyong Guifan Hanzi Zidian (é€šç”šè§„èŒƒæ±‰ć­—ć­—ć…ž)1
GHGB/T 15564-199559
GHZHanyu Da Zidian (æŒąèȘžć€§ć­—ć…ž)1
GHZRHanyu Da Zidian 2nd ed. (æ±‰èŻ­ć€§ć­—ć…ž, 珏äșŒç‰ˆ)1
GKGB 12052 -8989
GKJTerms in Sciences and Technologies (ç§‘æŠ€ç”šć­—) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST)16
GKXKangxi Dictionary (ćș·ç†™ć­—ć…ž)5
GLKLongkan Shoujian (韍韕手鑑)1
GTStandard Telegraph Codebook (revised), 19838
GUNo source (the original source reference may have been moved)88
GZFYHanyu Fangyan Dacidian (æ±‰èŻ­æ–čèš€ć€§èŻć…ž)1
Hong KongHHong Kong Supplementary Character Set, 20082,29215,376
HB0Computer Chinese Glyph and Character Code Mapping Table, Technical Report C-26
(é›»è…Šç”šäž­æ–‡ć­—ćž‹èˆ‡ć­—çąŒć°ç…§èĄš, æŠ€èĄ“é€šć ±C-26)
9
HB1Big-5, Level 15,401
HB2Big-5, Level 27,650
HDHong Kong Supplementary Character Set, 201624
JapanJ0JIS X 0208-19906,35618,249
J1JIS X 0212-19903,058
J13JIS X 0213:2004 level-3 characters replacing J1 characters1,037
J13AJIS X 0213:2004 level-3 character addendum from JIS X 0213:2000 level-3 replacing J1 character2
J14JIS X 0213:2004 level-4 characters replacing J1 characters1,704
J3JIS X 0213:2004 Level 395
J3AJIS X 0213:2004 Level 3 addendum7
J4JIS X 0213:2004 Level 4301
JARIBARIB STD-B243
JMJCharacter Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)5,686
South KoreaK0KS C 5601-87 (now KS X 1001:2004)4,62015,442
K1KS C 5657-91 (now KS X 1002:2001)2,855
K2PKS C 5700-1:1994 (now KS X 1027-1:2011)7,911
K3PKS C 5700-2:1994 (now KS X 1027-2:2011)1
K4PKS C 5700-3:1998 (now KS X 1027-3:2011)4
K6KS X 1027-5:201449
KCKorean History On-Line (한ꔭ ì—­ì‚Ź ì •ëłŽ 톔합 시슀템)1
KUNo source (the original source reference may have been moved)1
North KoreaKP0KPS 9566-974,65215,010
KP1KPS 10721-200010,358
MacauMAHKSCS-200829200
MB1Big Five10
MB2Big Five7
MCMCSCS Reference3
MDMCSCS horizontal extensions127
MDHMCSCS horizontal extensions24
TaiwanT1CNS 11643-1992 plane 15,41318,384
T2CNS 11643-1992 plane 27,651
T3CNS 11643-1992 plane 34,144
T4CNS 11643-1992 plane 4894
T5CNS 11643-1992 plane 564
T6CNS 11643-1992 plane 631
T7CNS 11643-1992 plane 716
TBCNS 11643-2007 plane 112
TCCNS 11643-2007 plane 122
TECNS 11643-2007 plane 149
TFCNS 11643-2007 plane 15158
VietnamV0TCVN 5773:19935994,808
V1TCVN 6056:19953,305
V2VHN 01-1998759
V3VHN 02-199891
V4Kho Chữ Hån NÎm Mã Hoå (Hån NÎm Coded Character Repertoire)19
VNVietnamese horizontal extensions35
n/aUTCUTC sources7878

In Unicode 4.1, 14 HKSCS-2004 characters and 8 GB 18030 characters were assigned to between U+9FA6 and U+9FBB code points. Since then, other additions were added to this block for various reasons, all summarized in the version history section below.

The block named CJK Unified Ideographs Extension A (3400–4DBF) contains 6,592 additional characters in the range U+3400 through U+4DBF.

Charts

3400-4DBF.

Sources

Note: Most characters appear in more than one source, so the sum of individual character counts (23,954) is far greater than the number of encoded characters (6,592).21

Country or regionCodeSource 23Character countTotal
ChinaG3GB 13131 (unpublished Traditional Chinese analogue to GB 7589-87)2,3916,197
G5GB 13132 (unpublished Traditional Chinese analogue to GB 7590-87)1,226
G7Modern Chinese general character chart120
GGFZTongyong Guifan Hanzi Zidian (é€šç”šè§„èŒƒæ±‰ć­—ć­—ć…ž)2
GHZHanyu Da Zidian (æŒąèȘžć€§ć­—ć…ž)340
GKJTerms in Sciences and Technologies (ç§‘æŠ€ç”šć­—) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST)3
GKXKangxi Dictionary (ćș·ç†™ć­—ć…ž)1,889
GSSingapore Chinese characters 1226
Hong KongHHong Kong Supplementary Character Set, 2008572572
JapanJ3JIS X 0213:2004 Level 325,856
J4JIS X 0213:2004 Level 478
JAJapanese IT Vendors Contemporary Ideographs, 1993574
JA3JIS X 0213:2004 level-3 characters replacing JA characters17
JA4JIS X 0213:2004 level-4 characters replacing JA characters67
JMJCharacter Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)5,118
South KoreaK3PKS C 5700-2:1994 (now KS X 1027-2:2011)1,8331,867
K4PKS C 5700-3:1998 (now KS X 1027-3:2011)2
K6KS X 1027-5:201428
KCKorean History On-Line (한ꔭ ì—­ì‚Ź ì •ëłŽ 톔합 시슀템)3
KUNo source (the original source reference may have been moved)1
North KoreaKP0KPS 9566-9713,191
KP1KPS 10721-20003,190
MacauMAHKSCS-2008412
MDMCSCS horizontal extensions8
TaiwanT3CNS 11643-1992 plane 32,1795,916
T4CNS 11643-1992 plane 42,919
T5CNS 11643-1992 plane 5399
T6CNS 11643-1992 plane 6200
T7CNS 11643-1992 plane 7133
TECNS 11643-2007 plane 141
TFCNS 11643-2007 plane 1585
United KingdomUKIRG N2107R233
VietnamV0TCVN 5773:1993140319
V2VHN 01-1998149
V3VHN 02-199819
V4Kho Chữ Hån NÎm Mã Hoå (Hån NÎm Coded Character Repertoire)5
VNVietnamese horizontal extensions6
n/aUTCUTC sources2121

The block named CJK Unified Ideographs Extension B (20000–2A6DF) contains 42,720 characters in the range U+20000 through U+2A6DF. These include most of the characters used in the Kangxi Dictionary that are not in the basic CJK Unified Ideographs block, as well as many Hán-Nîm characters that were formerly used to write Vietnamese.

Charts

20000-215FF,21600-230FF,23100-245FF,24600-260FF,26100-275FF,27600-290FF,29100-2A6DF.

Sources

Note: Many characters appear in more than one source, so the sum of individual character counts (99,784) is far greater than the number of encoded characters (42,720).21

Country or regionCodeSource 23Character countTotal
ChinaG3GB 13131 (unpublished Traditional Chinese analogue to GB 7589-87)130,550
G4KSiku Quanshu (曛ćș«ć…šæ›ž)477
GBKEncyclopedia of China (äž­ćœ‹ć€§ç™Ÿç§‘ć…šæ›ž)86
GCHCihai (蟞攷)247
GCYCiyuan (蟭æș)66
GFZFounder Press System65
GGFZTongyong Guifan Hanzi Zidian (é€šç”šè§„èŒƒæ±‰ć­—ć­—ć…ž)5
GHCHanyu Da Cidian (æŒąèȘžć€§è©žć…ž)553
GHFHanwen fodian yinan suzi huishi yu yanjiu (æŒąæ–‡äœ›ć…žç–‘é›Łäż—ć­—ćœ™é‡‹èˆ‡ç ”ç©¶)1
GHZHanyu Da Zidian (æŒąèȘžć€§ć­—ć…ž)10,507
GHZRHanyu Da Zidian 2nd ed. (æ±‰èŻ­ć€§ć­—ć…ž, 珏äșŒç‰ˆ)1
GKJTerms in Sciences and Technologies (ç§‘æŠ€ç”šć­—) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST)17
GKXKangxi Dictionary (ćș·ç†™ć­—ć…ž)18,469
GUNo source (the original source reference may have been moved)55
Hong KongHHong Kong Supplementary Character Set, 20081,7031,703
JapanJ3JIS X 0213:2004 Level 32525,745
J3AJIS X 0213:2004 Level 3 addendum1
J4JIS X 0213:2004 Level 4277
JMJCharacter Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)25,442
South KoreaK1KS C 5657-91 (now KS X 1002:2001)1395
K4PKS C 5700-3:1998 (now KS X 1027-3:2011)166
K6KS X 1027-5:2014214
KCKorean History On-Line (한ꔭ ì—­ì‚Ź ì •ëłŽ 톔합 시슀템)14
North KoreaKP1KPS 10721-20005,7655,765
MacauMAHKSCS-2008938
MCMCSCS Reference2
MDMCSCS horizontal extensions27
TaiwanT3CNS 11643-1992 plane 32530,193
T4CNS 11643-1992 plane 43,408
T5CNS 11643-1992 plane 58,111
T6CNS 11643-1992 plane 65,934
T7CNS 11643-1992 plane 76,299
TACNS 11643-2007 plane 108
TBCNS 11643-2007 plane 116
TCCNS 11643-2007 plane 121
TFCNS 11643-2007 plane 156,401
United KingdomUKIRG N2107R21212
VietnamV0TCVN 5773:19931,5705,299
V1TCVN 6056:19951
V2VHN 01-19982,286
V3VHN 02-1998422
V4Kho Chữ Hån NÎm Mã Hoå (Hån NÎm Coded Character Repertoire)33
VNVietnamese horizontal extensions987
Buddhist canonSATSAT Daizƍkyƍ Text Database11
n/aUTCUTC sources8383

The block named CJK Unified Ideographs Extension C (2A700–2B73F) contains 4,154 characters in the range U+2A700 through U+2B739. It was initially added in Unicode 5.2 (2009).

Charts

2A700-2B73F.

Sources

Note: Some characters appear in more than one source, so the sum of individual character counts (4,634) is greater than the number of encoded characters (4,154).21

Country or regionCodeSource 23Character countTotal
ChinaGBKEncyclopedia of China (äž­ćœ‹ć€§ç™Ÿç§‘ć…šæ›ž)741,130
GCHCihai (蟞攷)264
GCYCiyuan (蟭æș)1
GCYYChinese Academy of Surveying and Mapping ideographs55
GDMPlace name characters from the Public Order Administration, Ministry of Public Security of the People's Republic of China1
GFZFounder Press System1
GGFZTongyong Guifan Hanzi Zidian (é€šç”šè§„èŒƒæ±‰ć­—ć­—ć…ž)2
GGHGudai Hanyu Cidian (ć€ä»Łæ±‰èŻ­èŻć…ž)51
GHCHanyu Da Cidian (æŒąèȘžć€§è©žć…ž)14
GHZHanyu Da Zidian (æŒąèȘžć€§ć­—ć…ž)1
GHZRHanyu Da Zidian 2nd ed. (æ±‰èŻ­ć€§ć­—ć…ž, 珏äșŒç‰ˆ)1
GJZCommercial Press ideographs61
GKJTerms in Sciences and Technologies (ç§‘æŠ€ç”šć­—) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST)6
GKXKangxi Dictionary (ćș·ç†™ć­—ć…ž)6
GXCXiandai Hanyu Cidian (çŽ°ä»Łæ±‰èŻ­èŻć…ž)25
GZFYHanyu Fangyan Dacidian (æ±‰èŻ­æ–čèš€ć€§èŻć…ž)202
GZJWYin Zhou Jinwen Jicheng Yinde (æź·ć‘šé‡‘æ–‡é›†æˆćŒ•ćŸ—)365
Hong KongHHong Kong Supplementary Character Set, 200811
JapanJKJapanese Kokuji Collection367431
JMJCharacter Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)64
South KoreaK5Korean IRG Hanja Character Set (later became KS X 1027-4:2011)404406
K6KS X 1027-5:20141
KCKorean History On-Line (한ꔭ ì—­ì‚Ź ì •ëłŽ 톔합 시슀템)1
North KoreaKP1KPS 10721-200088
MacauMCMCSCS Reference1721
MDMCSCS horizontal extensions4
TaiwanT5CNS 11643-1992 plane 511,752
TCCNS 11643-2007 plane 12634
TDCNS 11643-2007 plane 13766
TECNS 11643-2007 plane 14350
TUNo source (the original source reference may have been moved)1
United KingdomUKIRG N2107R211
VietnamV0TCVN 5773:19934795
V1TCVN 6056:19952
V2VHN 01-19981
V4Kho Chữ Hån NÎm Mã Hoå (Hån NÎm Coded Character Repertoire)782
VNVietnamese horizontal extensions6
n/aUTCUTC sources8989

The block named CJK Unified Ideographs Extension D (2B740–2B81F) contains 222 characters in the range U+2B740 through U+2B81D that were added in Unicode 6.0 (2010).

Charts

2B740–2B81F.

Sources

Note: Some characters appear in more than one source, so the sum of individual character counts (239) is greater than the number of encoded characters (222).21

Country or regionCodeSource 23Character countTotal
ChinaGCHCihai (蟞攷)178
GDMPlace name characters from the Public Order Administration, Ministry of Public Security of the People's Republic of China1
GIDCID System of the Ministry of Public Security of China9
GKJTerms in Sciences and Technologies (ç§‘æŠ€ç”šć­—) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST)2
GXCXiandai Hanyu Cidian (çŽ°ä»Łæ±‰èŻ­èŻć…ž)4
GXMCharacters for use in personal names in China from Public Order Administration, Ministry of Public Security of the People's Republic of China22
GZHZhonghua Zihai (äž­ćŽć­—æ”·)39
JapanJHHanyo-Denshi Program (æ±Žç”šé›»ć­æƒ…ć ±äș€æ›ç’°ćąƒæ•Žć‚™ăƒ—ăƒ­ă‚°ăƒ©ăƒ )107117
JMJCharacter Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)10
TaiwanTBCNS 11643-2007 plane 112424
n/aUTCUTC sources2020

The block named CJK Unified Ideographs Extension E (2B820–2CEAF) contains 5,762 characters in the range U+2B820 through U+2CEA1 that were added in Unicode 8.0 (2015).

Charts

2B820–2CEAF.

Sources

Note: Some characters appear in more than one source, so the sum of individual character counts (5,919) is greater than the number of encoded characters (5,762).21

Country or regionCodeSource 23Character countTotal
ChinaGBKEncyclopedia of China (äž­ćœ‹ć€§ç™Ÿç§‘ć…šæ›ž)152,822
GCHCihai (蟞攷)112
GCYCiyuan (蟭æș)3
GCYYChinese Academy of Surveying and Mapping ideographs98
GDZGeology Press ideographs1
GGFZTongyong Guifan Hanzi Zidian (é€šç”šè§„èŒƒæ±‰ć­—ć­—ć…ž)4
GGHGudai Hanyu Cidian (ć€ä»Łæ±‰èŻ­èŻć…ž)175
GHCHanyu Da Cidian (æŒąèȘžć€§è©žć…ž)7
GIDCID System of the Ministry of Public Security of China37
GJZCommercial Press ideographs147
GKJTerms in Sciences and Technologies (ç§‘æŠ€ç”šć­—) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST)2
GKXKangxi Dictionary (ćș·ç†™ć­—ć…ž)22
GRMPeople's Daily ideographs3
GUNo source (the original source reference may have been moved)1
GWZHanyu Da Cidian Press ideographs12
GXCXiandai Hanyu Cidian (çŽ°ä»Łæ±‰èŻ­èŻć…ž)57
GXHXinhua Zidian (æ–°ćŽć­—ć…ž)4
GZFYHanyu Fangyan Dacidian (æ±‰èŻ­æ–čèš€ć€§èŻć…ž)712
GZJWYin Zhou Jinwen Jicheng Yinde (æź·ć‘šé‡‘æ–‡é›†æˆćŒ•ćŸ—)1,410
Hong KongHDHong Kong Supplementary Character Set, 201611
JapanJKJapanese Kokuji Collection415503
JMJCharacter Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)88
South KoreaKCKorean History On-Line (한ꔭ ì—­ì‚Ź ì •ëłŽ 톔합 시슀템)77
MacauMCMCSCS Reference4851
MDMCSCS horizontal extensions3
TaiwanT3CNS 11643-1992 plane 321,261
TBCNS 11643-2007 plane 112
TCCNS 11643-2007 plane 12323
TDCNS 11643-2007 plane 13595
TECNS 11643-2007 plane 14339
United KingdomUKIRG N2107R222
VietnamV0TCVN 5773:199361,036
V2VHN 01-19981
V4Kho Chữ Hån NÎm Mã Hoå (Hån NÎm Coded Character Repertoire)1,023
VNVietnamese horizontal extensions6
n/aUTCUTC sources236236

The block named CJK Unified Ideographs Extension F (2CEB0–2EBEF) contains 7,473 characters in the range U+2CEB0 through 2EBE0 that were added in Unicode 10.0 (2017). It includes more than 1,000 Sawndip characters for Zhuang.

Charts

2CEB0–2EBEF.

Sources

Note: Some characters appear in more than one source, so the sum of individual character counts (7,775) is greater than the number of encoded characters (7,473).21

Country or regionCodeSource 23Character countTotal
ChinaGCYCiyuan (蟭æș)1221,309
GFCModern Chinese Standard Dictionary (çŽ°ä»Łæ±‰èŻ­è§„èŒƒèŻć…žçŹŹäșŒç‰ˆ)27
GIDCID System of the Ministry of Public Security of China1
GKJTerms in Sciences and Technologies (ç§‘æŠ€ç”šć­—) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST)5
GLGYJZhuang Liao Songs Research (ćŁźæ—ć˜č歌研究)1
GOCDOxford English-Chinese Chinese-English Dictionary (ç‰›æŽ„è‹±æ±‰æ±‰è‹±èŻć…ž)2
GPGLGZhuang Folk Song Culture Series - Pingguo County Liao Songs (ćŁźæ—æ°‘æ­Œæ–‡ćŒ–äž›äčŠâ€ąćčłæžœć˜č歌)70
GXHZXinhua Da Zidian (æ–°ćŽć€§ć­—ć…ž)51
GZAncient Zhuang Character Dictionary (ć€ćŁźć­—ć­—ć…ž)995
GZJWYin Zhou Jinwen Jicheng Yinde (æź·ć‘šé‡‘æ–‡é›†æˆćŒ•ćŸ—)33
GZYSChinese Ancient Ethnic Characters Research (äž­ć›œæ°‘æ—ć€æ–‡ć­—ç ”ç©¶)2
Hong KongHDHong Kong Supplementary Character Set, 201611
JapanJMJCharacter Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)1,6461,646
South KoreaKCKorean History On-Line (한ꔭ ì—­ì‚Ź ì •ëłŽ 톔합 시슀템)1,8101,810
MacauMCMCSCS Reference2222
TaiwanT3CNS 11643-1992 plane 313
T6CNS 11643-1992 plane 61
TCCNS 11643-2007 plane 121
United KingdomUKIRG N2107R222
VietnamV0TCVN 5773:1993117
V4Kho Chữ Hån NÎm Mã Hoå (Hån NÎm Coded Character Repertoire)8
VNVietnamese horizontal extensions8
Buddhist canonSATSAT Daizƍkyƍ Text Database2,8842,884
n/aUTCUTC sources8181

A block named CJK Unified Ideographs Extension G was added as part of Unicode 13.0 to the Tertiary Ideographic Plane in the range U+30000 through U+3134F, containing 4,939 characters.22

Charts

30000–3134F.

Sources

Note: Some characters appear in more than one source, so the sum of individual character counts (5,081) is greater than the number of encoded characters (4,939).21

Country or regionCodeSource 23Character countTotal
ChinaGHZRHanyu Da Zidian 2nd ed. (æ±‰èŻ­ć€§ć­—ć…ž, 珏äșŒç‰ˆ)8782,082
GPGLGZhuang Folk Song Culture Series - Pingguo County Liao Songs (ćŁźæ—æ°‘æ­Œæ–‡ćŒ–äž›äčŠâ€ąćčłæžœć˜č歌)13
GZAncient Zhuang Character Dictionary (ć€ćŁźć­—ć­—ć…ž)1,191
South KoreaKCKorean History On-Line (한ꔭ ì—­ì‚Ź ì •ëłŽ 톔합 시슀템)435435
TaiwanT13CNS 11643 (pending new version) plane 19347353
TBCNS 11643-2007 plane 113
TCCNS 11643-2007 plane 122
TDCNS 11643-2007 plane 131
United KingdomUKIRG N2107R21,5661,566
VietnamV4Kho Chữ Hån NÎm Mã Hoå (Hån NÎm Coded Character Repertoire)676
VNVietnamese horizontal extensions70
Buddhist canonSATSAT Daizƍkyƍ Text Database329329
n/aUTCUTC sources240240

A block named CJK Unified Ideographs Extension H was added as part of Unicode 15.0 to the Tertiary Ideographic Plane in the range U+31350 through U+323AF, containing 4,192 characters.23

Charts

31350–323AF.

Sources

Note: Some characters appear in more than one source, so the sum of individual character counts (4,309) is greater than the number of encoded characters (4,192).21

Country or regionCodeSource 23Character countTotal
ChinaGDMPlace name characters from the Public Order Administration, Ministry of Public Security of the People's Republic of China128829
GHCHanyu Da Cidian (æŒąèȘžć€§è©žć…ž)27
GKJTerms in Sciences and Technologies (ç§‘æŠ€ç”šć­—) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST)30
GLGYJZhuang Liao Songs Research (ćŁźæ—ć˜č歌研究)11
GPGLGZhuang Folk Song Culture Series - Pingguo County Liao Songs (ćŁźæ—æ°‘æ­Œæ–‡ćŒ–äž›äčŠâ€ąćčłæžœć˜č歌)14
GUNo source (the original source reference may have been moved)1
GXMCharacters for use in personal names in China from Public Order Administration, Ministry of Public Security of the People's Republic of China216
GZAncient Zhuang Character Dictionary (ć€ćŁźć­—ć­—ć…ž)285
GZA-1A Vibrant and Unbroken Transmission—Filial Piety and Zhuang Funeral Songs (ç”Ÿç”ŸäžæŻçš„äŒ æ‰żâ€ąć­äžŽćŁźæ—èĄŒć­æ­Œäč‹ç ”ç©¶)6
GZA-2Annotated Long Zhuang Morality Songs (ćŁźæ—äŒŠç†é“ćŸ·é•żèŻ—äŒ æ‰Źæ­ŒèŻ‘æłš)38
GZA-3Compendium of Old Zhuang Folksong Texts—Wooing Songs vol. 1—Liao Songs (ćŁźæ—æ°‘æ­Œć€ç±é›†æˆâ€ąæƒ…æ­ŒïŒˆäž€ïŒ‰ć˜č歌)2
GZA-4Compendium of Old Zhuang Folksong Texts—Wooing Songs vol. 2—Fwen Nganx (ćŁźæ—æ°‘æ­Œć€ç±é›†æˆâ€ąæƒ…æ­ŒïŒˆäșŒïŒ‰æŹąđ­Ș€)11
GZA-6Zhuang Proverbs from China (äž­ć›œćŁźæ—è°šèŻ­)59
GZA-7Ancient Remembrance—Zhuang Creation Myth Songs (èżœć€çš„èżœćż†â€ąćŁźæ—ćˆ›äž–ç„žèŻć€æ­Œç ”ç©¶)1
South KoreaKCKorean History On-Line (한ꔭ ì—­ì‚Ź ì •ëłŽ 톔합 시슀템)512512
North KoreaKP1KPS 10721-200011
TaiwanT12CNS 11643 (pending new version) plane 187714
T13CNS 11643 (pending new version) plane 19696
T4CNS 11643-1992 plane 41
T6CNS 11643-1992 plane 61
TBCNS 11643-2007 plane 115
TCCNS 11643-2007 plane 123
TECNS 11643-2007 plane 141
United KingdomUKIRG N2232R917917
VietnamV0TCVN 5773:19936931
V4Kho Chữ Hån NÎm Mã Hoå (Hån NÎm Coded Character Repertoire)74
VNVietnamese horizontal extensions851
Buddhist canonSATSAT Daizƍkyƍ Text Database241241
n/aUTCUTC sources164

A block named CJK Unified Ideographs Extension I was added as part of Unicode 15.1 to the Supplementary Ideographic Plane in the range U+2EBF0 through U+2EE5F, containing 622 characters.24

Charts

2EBF0–2EE5F.

Sources

Note: Some characters appear in more than one source, making the sum of individual character counts (625) more than the number of encoded characters (622).21

Country or regionCodeSource 25Character countTotal
ChinaGIDC23ID system of the Ministry of Public Security of China, 2023622622
JapanJMJCharacter Information Development and Maintenance Project for e-Government “MojiJoho-Kiban Project” (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)11
n/aUTCUTC sources22

The block named CJK Compatibility Ideographs (F900–FAFF) was created to retain round-trip compatibility with other standards.

However, twelve characters in this block actually have the “Unified Ideograph” property: U+FA0E , U+FA0F , U+FA11 ïš‘, U+FA13 ïš“, U+FA14 ïš”, U+FA1F , U+FA21 ïšĄ, U+FA23 ïšŁ, U+FA24 , U+FA27 ïš§, U+FA28 ïšš, and U+FA29 ïš©.1 None of the other characters in this and other “Compatibility” blocks relate to CJK unification.

While 韜 and äș€ are not considered unifiable, U+FA20 ïš  CJK COMPATIBILITY IDEOGRAPH-FA20 is considered a duplicate to U+8612 蘒 CJK UNIFIED IDEOGRAPH-8612.

Charts

F900–FAFF.

Sources

Note: All characters appear in more than one source, so the sum of individual character counts (40) is greater than the number of encoded characters (12).21

Country or regionCodeSource 23Character countTotal
ChinaGUNo source (the original source reference may have been moved)1212
JapanJ3JIS X 0213:2004 Level 3312
J4JIS X 0213:2004 Level 43
JAJapanese IT Vendors Contemporary Ideographs, 19931
JA3JIS X 0213:2004 level-3 characters replacing JA characters1
JMJCharacter Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (æ–‡ć­—æƒ…ć ±ćŸșç›€æ•Žć‚™äș‹æ„­)4
TaiwanTFCNS 11643-2007 plane 1511
VietnamV0TCVN 5773:199333
n/aUTCUTC sources1212

Known issues

Disunification

U+4039

The character U+4039 (ä€č) was a unification of two different characters (one with jiā ć€Ÿ phonetic and one with shǎn 㚒 phonetic) until Unicode 5.0. However, they were lexically different characters that should not have been unified; they have different pronunciations and different meanings.

The proposal of disunification of U+4039 26 was accepted for Unicode 5.1, encoding a new character at U+9FC3 (鿃) to represent shǎn.

In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B (đ …»), U+204AF (𠒯) and U+24CB2 (đ€ČČ). The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones.27

Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded by mistake.28 Additionally, an ISO/IEC JTC 1/SC 2 report has found that six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a de facto disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake:29

  • U+34A8 㒹 = U+20457 𠑗: U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8
  • U+3DB7 ă¶· = U+2420E đ€ˆŽ: same glyph shapes
  • U+8641 虁 = U+27144 𧅄: U+27144 is the same as the Korean-source glyph for U+8641, but it is significantly different from the Chinese Mainland-, Taiwan- and Japan-source glyphs for U+8641
  • U+204F2 đ “Č = U+23515 𣔕: same glyph shapes, but ordered under different radicals
  • U+249BC đ€ŠŒ = U+249E9 đ€§©: same glyph shapes
  • U+24BD2 đ€Ż’ = U+2A415 đȘ•: same glyph shapes, but ordered under different radicals
  • U+26842 𩡂 = U+26866 𩡩: same glyph shapes
  • U+FA23 ïšŁ = U+27EAF đ§șŻ: same glyph shapes (U+FA23 ïšŁ is a unified CJK ideograph, despite its name “CJK COMPATIBILITY IDEOGRAPH-FA23.“)

Apart from the ten blocks of “Unified Ideographs,” Unicode has about a dozen more blocks with not-unified CJK-characters. These are mainly CJK radicals, strokes, punctuation, marks, symbols and compatibility characters. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different. An example of a not-unified CJK-character is U+3007 〇 IDEOGRAPHIC NUMBER ZERO in the CJK Symbols and Punctuation block. Although it is not covered under “CJK Unified Ideographs”, it is treated as a CJK-character for all other intents and purposes.30

Four blocks of compatibility characters are included for compatibility with legacy text handling systems and older character sets:

They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means. Therefore, their use is discouraged.

Font support

The blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A, being parts of the Basic Multilingual Plane, are supported by the majority of the CJK fonts. However, Japanese and Korean fonts usually have fewer characters (about 13,000 and 8,000, respectively) than Chinese. Extensions B, C, D are supported by additional fonts MingLiU-ExtB, MingLiU_HKSCS-ExtB, PMingLiU-ExtB, SimSun-ExtB included in Microsoft Windows since Vista.31

CJK unified ideographs additions per Unicode version
Unicode versionAdditionPlaneCharacters addedTotal characters
1.0 (1991)CJK Unified IdeographsBasic Multilingual Plane (BMP)20,90220,914
CJK Compatibility IdeographsBMP12
3.0 (1999)CJK Unified Ideographs Extension ABMP6,58227,496
3.1 (2001)CJK Unified Ideographs Extension BSupplementary Ideographic Plane (SIP)42,71170,207
4.1 (2005)CJK Unified Ideographs: Ideographs from HKSCS-2004 and GB 18030-2000 not in ISO 10646BMP2270,229
5.1 (2008)CJK Unified Ideographs: Ideographs from Adobe Japan and disunification of U+4039BMP870,237
5.2 (2009)CJK Unified Ideographs Extension CSIP4,14974,394
8 other characters from ARIB #47, #95, #93 and HKSCSBMP8
6.0 (2010)CJK Unified Ideographs Extension DSIP22274,616
6.1 (2012)1 character corresponding to Adobe-Japan1-6 CID+20156BMP174,617
8.0 (2015)CJK Unified Ideographs Extension ESIP5,76280,388
9 other charactersBMP9
10.0 (2017)CJK Unified Ideographs Extension FSIP7,47387,882
21 other charactersBMP21
11.0 (2018)CJK Unified IdeographsBMP587,887
13.0 (2020)CJK Unified IdeographsBMP1392,856
CJK Unified Ideographs Extension ABMP10
CJK Unified Ideographs Extension BSIP7
CJK Unified Ideographs Extension GTertiary Ideographic Plane (TIP)4,939
14.0 (2021)CJK Unified IdeographsBMP392,865
CJK Unified Ideographs Extension BSIP2
CJK Unified Ideographs Extension CSIP4
15.0 (2022)CJK Unified Ideographs Extension CSIP197,058
CJK Unified Ideographs Extension HTIP4,192
15.1 (2023)CJK Unified Ideographs Extension ISIP62297,680

See also

Notes

References

Footnotes

  1. “Unicode 16.0 UCD: PropList.txt”. 2024-05-31. Retrieved 2024-09-14. ↩ ↩2

  2. IRG Convenor (2024-12-10). “IRG Experts List”. ISO/IEC JTC1 / SC2 /WG2/ IRG N2769. ↩

  3. Lunde, Ken (2024-09-13). “US/Unicode Activity Report for IRG #63 Meeting” (PDF). ISO/IEC JTC1 / SC2 /WG2/ IRG N2700. ↩

  4. “Unicode 16.0 UCD: Unihan: Unihan_IRGSources.txt”. 2024-07-31. Retrieved 2024-09-10. ↩

  5. Lunde, Ken (2024-07-31). “UAX #45: U-source Ideographs”. Unicode Consortium. ↩

  6. “18.1.7. Han Ideograph Arrangement”. The Unicode Standard: Core Specification. Version 16.0.0. Unicode Consortium. ↩

  7. “3.3. Dictionary Indices”. Unicode Han Database (Unihan). UAX #38. Three of the dictionary properties represent official IRG indices for the dictionaries used in the four dictionary sorting algorithm. Two (kIRGHanyuDaZidian and kIRGKangXi) are still being used by the IRG, but the other one (kIRGDaeJaweon) is not. ↩ ↩2

  8. Lunde, Ken (2022-09-01). “Proposal to remove/improve provisional Unihan database properties” (PDF). p. 6. UTC L2/22-188. In addition, the IRG no longer uses this dictionary for its ongoing work. ↩

  9. “kIRG_GSource”. Unicode Han Database (Unihan). UAX #38. GKX: Kangxi Dictionary ideographs (ćș·ç†™ć­—ć…ž) 9th edition (1958) including the addendum (ćș·ç†™ć­—ć…ž)èŁœéș. GHZ: Hanyu Dazidian ideographs (æŒąèȘžć€§ć­—ć…ž). ↩

  10. Lunde, Ken (2018-02-22). “Proposed kIRG_GSource Changes & Corrections” (PDF). UTC L2/18-065; ISO/IEC JTC1 / SC2 /WG2/ IRG N2297. ↩

  11. “2. Text File Data”. U-Source Ideographs. Unicode Consortium. UAX #45. A KangXi dictionary index for the ideograph, as described in Unicode Standard Annex #38, “Unicode Han Database (Unihan)” [UAX38]. This field is no longer used and contains no data. ↩

  12. Lunde, Ken (2024-09-30). “Proposal to remove FS (first residual stroke) value from submissions” (PDF). ISO/IEC JTC1 / SC2 /WG2/ IRG N2713. This document proposes that the inclusion of first residual stroke (aka FS) values be removed from the submission requirements for new CJK Unified Ideographs [
] The ISO/IEC 10646 Project Editor, when compiling an IRG working set into a new CJK Unified Ideographs extension block, uses the FS values to sort ideographs that share the same Radical-Stroke (Radical + SC) value. ↩

  13. Lunde, Ken (2012-09-16). “URO”. CJK Type Blog. Adobe Inc. ↩

  14. The Unicode Standard 4.0, Appendix A - Han Unification History ↩

  15. Suzanne Topping, “The secret life of Unicode”. Archived from the original on 2007-11-14. Retrieved 2010-05-12.{{[cite web](https://en.wikipedia.org/wiki/Template:Cite_web "Template:Cite web")}}: CS1 maint: bot: original URL status unknown (link) ↩

  16. Lu, Qin (2015-06-08). “The Proposed Hong Kong Character Set” (PDF). ISO/IEC JTC1 / SC2 /WG2/ IRG N2074. ↩

  17. ” Chapter 11 - East Asian scripts ”, The Unicode standard, 4.0. ↩

  18. “Ideographic Variation Database”. 2022-09-13. Retrieved 2022-09-20. ↩

  19. “IVD Stats”. 2022-09-13. Retrieved 2022-09-20. ↩

  20. PRI 108: Combined registration of the Adobe Japan1 collection and of sequences in that collection ↩

  21. “Unihan_IRGSources.txt (from Unihan.zip)“. 2023-07-15. Retrieved 2024-09-10. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11

  22. “Unicode 13.0.0”. 10 March 2020. Retrieved 10 March 2020. ↩

  23. “Unicode 15.0.0”. 13 September 2022. Retrieved 14 September 2022. ↩

  24. “Unicode 15.1.0”. 2023-09-12. Retrieved 2023-09-12. ↩

  25. “UAX #38: Unicode Han Database (Unihan)“. Unicode Consortium. 2024-07-31. ↩

  26. Andrew West and John Jenkins, proposal of disunification of U+4039 ↩

  27. Eiso Chan (陈氞èȘ), Comments on four error glyphs on CJK Unified Ideographs Ext B & E.[1] ↩

  28. Taichi Kawabata. “IRGN1155 Possible Duplicates” (.zip). Retrieved 2019-06-22. ↩

  29. Cook, Richard (6 October 2003). “Defect Report on Duplicate Encoded CJK Forms” (PDF). ISO/IEC JTC1/SC2/WG2. Retrieved 2012-03-28. ↩

  30. GB/T 15835-2011《ć‡șç‰ˆç‰©äžŠæ•°ć­—ç”šæł•ă€‹. China Guojia Biaozhun. https://journals.usst.edu.cn/uploadfile/file/GBT%2015835-2011%E3%80%8A%E5%87%BA%E7%89%88%E7%89%A9%E4%B8%8A%E6%95%B0%E5%AD%97%E7%94%A8%E6%B3%95%E3%80%8B.pdf ↩

  31. Lunde, Ken (2009). CJKV Information Processing. O’Reilly. pp. 633– 634. ISBN 978-0-596-51447-1. ↩