source: trunk/third/glib2/tests/utf8.txt @ 18159

Revision 18159, 3.2 KB checked in by ghudson, 22 years ago (diff)
This commit was generated by cvs2svn to compensate for changes in r18158, which included commits to RCS files with non-trunk default branches.
Line 
1# This file is derived from
2#
3#    http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
4#   
5# Which was created by   Markus Kuhn <mkuhn@acm.org> - 2000-09-02
6#
7# lines begining with # and blank lines are ignored
8#
9# Beyond that, this file consists of a series of test cases. Each test case consists of
10# 2 or 3 lines:
11#
12#  1. A UTF-8 string
13#  2. A status
14#      VALID      : The string is a valid UTF-8 representation of valid Unicode
15#      INCOMPLETE : The string has a partial character at the end
16#      NOTUNICODE : The string is valid UTF-8, but the characters represented
17#                   are not valid unicode (
18#      OVERLONG   : The string includes overlong sequences
19#      MALFORMED  : The string is not valid UTF-8
20# 3. If the status is VALID or NOTUNICODE, the UCS-4 representation of the string,
21#    as a series of hex numbers.
22
23# 1  Some correct UTF-8 text
24κόσμε
25VALID
2603ba 1f79 03c3 03bc 03b5
27
28# 2.1  First possible sequence of a certain length
29#
30# FIXME - handle NULLS?
31#
32# [ NULL BYTE ]
33#VALID
34#0000
35
36€
37VALID
380080
39
40ࠀ
41VALID
420800
43
44𐀀
45VALID
4600010000
47
48øˆ€€€
49NOTUNICODE
5000200000
51
52ü„€€€€
53NOTUNICODE
5404000000
55
56
57VALID
580000007f
59
60ß¿
61VALID
62000007ff
63
64ï¿¿
65NOTUNICODE
660000ffff
67
68÷¿¿¿
69NOTUNICODE
70001fffff
71
72û¿¿¿¿
73NOTUNICODE
7403ffffff
75
76ý¿¿¿¿¿
77NOTUNICODE
787fffffff
79
80# 2.3  Other boundary conditions
81
82퟿
83VALID
84d7ff
85
86
87VALID
88e000
89
90�
91VALID
92fffd
93
94􏿿
95VALID
960010ffff
97
98ô€€
99NOTUNICODE
10000110000
101
102# 3.1  Unexpected continuation bytes
103
104€
105MALFORMED
106¿
107MALFORMED
108€¿
109MALFORMED
110€¿€
111MALFORMED
112€¿€¿
113MALFORMED
114€¿€¿€
115MALFORMED
116€¿€¿€¿
117MALFORMED
118€¿€¿€¿€
119MALFORMED
120€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿
121MALFORMED
122
123# 3.2  Lonely start characters
124
125À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
126MALFORMED
127à á â ã ä å æ ç è é ê ë ì í î ï
128MALFORMED
129ð ñ ò ó ô õ ö ÷
130MALFORMED
131ø ù ú û
132MALFORMED
133ü ý
134MALFORMED
135
136# 3.3  Sequences with last continuation byte missing
137
138À
139INCOMPLETE
140à€
141INCOMPLETE
142ð€€
143INCOMPLETE
144ø€€€
145INCOMPLETE
146ü€€€€
147INCOMPLETE
148ß
149INCOMPLETE
150ï¿
151INCOMPLETE
152÷¿¿
153INCOMPLETE
154û¿¿¿
155INCOMPLETE
156ý¿¿¿¿
157INCOMPLETE
158
159# 3.4  Concatenation of incomplete sequences
160
161Àà€ð€€ø€€€ü€€€€ßï¿÷¿¿û¿¿¿ý¿¿¿¿
162MALFORMED
163
164# 3.5  Impossible bytes
165
166þ
167MALFORMED
168ÿ
169MALFORMED
170þþÿÿ
171MALFORMED
172
173#  Examples of an overlong ASCII character
174
175À¯
176OVERLONG
177à€¯
178OVERLONG
179ð€€¯
180OVERLONG
181ø€€€¯
182OVERLONG
183ü€€€€¯
184OVERLONG
185
186#  Maximum overlong sequences
187
188Á¿
189OVERLONG
190àŸ¿
191OVERLONG
192ð¿¿
193OVERLONG
194ø‡¿¿¿
195OVERLONG
196üƒ¿¿¿¿
197OVERLONG
198
199# Overlong representation of the NUL character
200
201
202OVERLONG
203à€€
204OVERLONG
205ð€€€
206OVERLONG
207ø€€€€
208OVERLONG
209ü€€€€€
210OVERLONG
211
212# Illegal code positions
213
214# Single UTF-16 surrogates
215
216í €
217NOTUNICODE
218d800
219
220í­¿
221NOTUNICODE
222db7f
223
224í®€
225NOTUNICODE
226db80
227
228í¯¿
229NOTUNICODE
230dbff
231
232í°€
233NOTUNICODE
234dc00
235
236í¾€
237NOTUNICODE
238df80
239
240í¿¿
241NOTUNICODE
242dfff
243
244# Paired UTF-16 surrogates
245
246𐀀
247NOTUNICODE
248d800 dc00
249
250𐏿
251NOTUNICODE
252d800 dfff
253
254󯰀
255NOTUNICODE
256db7f dc00
257
258í­¿í¿¿
259NOTUNICODE
260db7f dfff
261
262󰀀
263NOTUNICODE
264db80 dc00
265
266󰏿
267NOTUNICODE
268db80 dfff
269
270􏰀
271NOTUNICODE
272dbff dc00
273
274􏿿
275NOTUNICODE
276dbff dfff
277
278# Other illegal code positions
279
280￾
281NOTUNICODE
282fffe
283
284ï¿¿
285NOTUNICODE
286ffff
287
288################
289#
290# Some more tests, not from Markus Kuhn's file
291#
292
293# Mixed plane 0 and higher planes
294
295A𐀀B􏿿C
296VALID
29741 00010000 42 10ffff 43
Note: See TracBrowser for help on using the repository browser.