This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Updated Unicode-Normalize to CPAN version 1.12
[perl5.git] / cpan / Unicode-Collate / Changes
CommitLineData
ae6aa562
JH
1Revision history for Perl module Unicode::Collate.
2
7b98b857
CBW
30.73 Sun Mar 6 13:24:22 2011
4 - DUCET is updated (for Unicode 6.0.0) as Collate/allkeys.txt.
5 ! However no maint perl has supported Unicode 6.0.0 yet;
6 wait for 5.14, or try developing 5.13.7 or later.
7 ! Please notice that allkeys.txt will be overwritten if you have had
8 other allkeys.txt already.
9 - The default UCA_Version is 22. Locale/*.pl and Korean.pm are updated.
10 - test: compare allkeys.txt's version with Base_Unicode_Version
11 in t/default.t.
12
f58b9ef1
CBW
130.72 Sat Jan 22 17:28:32 2011
14 - xs: fix mixing char* and U8*.
15
160.71 Tue Jan 18 22:29:44 2011
17 - t/loc_test.t should not fail without Unicode::Normalize.
18
190.70 Sun Jan 16 20:31:07 2011
20 - Now U::C::Locale->new will use the compiled DUCET via XS if available.
21 added some tests in t/loc_test.t.
22
230.69 Sat Jan 15 19:41:11 2011
24 - clarified about XSUB. revised INSTALL in README.
25 - xs: flag passed to utf8n_to_uvuni().
26 - doc and comments: [perl #81876] Fix typos by Peter J. Acklam.
27
68adb2b0
CBW
280.68 Tue Nov 23 20:17:22 2010
29 - doc: clarified about (backwards => [ ]) and (backwards => undef).
30 - separated t/backwds.t from t/test.t.
31 - added cjk_b5.t, cjk_gb.t, cjk_ja.t, cjk_ko.t, cjk_py.t, cjk_st.t in t
32 for CJK/*.pm without Locale.pm.
33
b5d9a953
CBW
340.67 Sun Nov 14 11:38:59 2010
35 - supported UCA_Version 22 for Unicode 6.0.0.
36 * 2B740..2B81D are new CJK unified ideographs.
37 * noncharacters (e.g. U+FFFF) should be overridable, not be ignored.
38 ! DUCET is NOT updated, as no maint perl supports Unicode 6.0.0.
39 Thus the default UCA_Version is still 20.
40 - added t/nonchar.t.
41 - improved discontiguous contractions of 3 or more characters.
42 (e.g. 0FB2 0F71 0F80 and 0FB3 0F71 0F80)
43 - auxiliary: now 'mklocale' also copes with Korean.pm according to DUCET.
44
584e761d
CBW
450.66 Sun Nov 7 10:47:30 2010
46 - U::C::Locale newly supports locale: ko.
47 - added Unicode::Collate::CJK::Korean for ko.
48 - added t/loc_ko.t.
49 - 12 compat. ideographs (e.g. U+FA0E) are treated as unified ideographs.
50 (though DUCET also does it, now Unicode::Collate does it without DUCET.)
51 - added t/compatui.t.
f58b9ef1 52 ! Ideographs Ext.B (U+20000..U+2A6D6) can be overridden with UCA_Version 8.
584e761d
CBW
53 This is a long-standing behavior from Unicode::Collate 0.11 to 0.63.
54 A wrong fix at 0.64 should be abandoned.
55
028d3bfa
CBW
560.65 Wed Nov 3 13:10:20 2010
57 - U::C::Locale newly supports locale: zh and its some variants.
584e761d 58 (zh__big5han, zh__gb2312han, zh__pinyin, zh__stroke)
028d3bfa
CBW
59 - added Unicode::Collate::CJK::Big5 for zh__big5han.
60 - added Unicode::Collate::CJK::GB2312 for zh__gb2312han.
61 - added Unicode::Collate::CJK::Pinyin for zh__pinyin.
62 - added Unicode::Collate::CJK::Stroke for zh__stroke.
584e761d 63 - added loc_zh.t, loc_zhb5.t, loc_zhgb.t, loc_zhpy.t, loc_zhst.t in t.
028d3bfa 64
539ce3d8
CBW
650.64 Sun Oct 31 14:17:29 2010
66 - U::C::Locale newly supports locale: ja.
67 - added Unicode::Collate::CJK::JISX0208 for ja.
584e761d 68 - added loc_ja.t, loc_jait.t, loc_japr.t in t.
539ce3d8
CBW
69 - a subroutine specified in 'overrideCJK' or 'overrideHangul' is allowed
70 to return an integer or undef value.
584e761d
CBW
71 - fix: Ideographs Ext.B (U+20000..U+2A6D6) are assigned in Unicode 3.1,
72 then 'overrideCJK' should not override them with UCA_Version 8.
73 !! sorry, this fix is based on a wrong idea. reverted at 0.66. !!
74 - separated t/overcjk0.t and t/overcjk1.t from t/override.t.
539ce3d8 75
aa7758f7
CBW
760.63 Sun Oct 10 22:13:21 2010
77 - supported suppress contractions (see 'suppress' in POD).
028d3bfa 78 - internal for 'hangul_terminator' in getSortKey().
aa7758f7 79 - U::C::Locale newly supports locales: be, bg, kk, mk, ru, sr.
584e761d
CBW
80 - added loc_be.t, loc_bg.t, loc_cyrl.t, loc_kk.t, loc_mk.t, loc_ru.t,
81 loc_sr.t in t.
aa7758f7
CBW
82 - added tailoring with U+0340 or U+0341 instead of U+0300 or U+0301.
83 (affected locales: hr, is, pl, se, to, wo)
84
6709de88
CBW
850.62 Wed Oct 6 21:35:54 2010
86 - U::C::Locale newly supports locales: ar, hu, hy, se, to, uk.
584e761d 87 - added loc_ar.t, loc_hu.t, loc_hy.t, loc_se.t, loc_to.t, loc_uk.t in t.
6709de88
CBW
88 - Vietnamese (vi): added tailoring for U+0340 and U+0341.
89
c02ee425
CBW
900.61 Sat Oct 2 11:41:29 2010
91 - U::C::Locale newly supports locales: hr, ig, sq.
584e761d 92 - added loc_hr.t, loc_ig.t, loc_sq.t in t.
c02ee425
CBW
93 - precomposites of e-dot-below, o-dot-below, o-tilde are tailored as well.
94 (affected locales: et, yo)
95 - Vietnamese (vi): added contractions for non-blocked decompositions
aa7758f7 96 * base + dot-below + mark such as a\x{323}\x{306}, \x{1EA1}\x{306} etc.
6709de88 97 * base + tone + horn such as o\x{309}\x{31B}, \x{1ECF}\x{31B} etc.
c02ee425 98
1393fe00
CBW
990.60 Thu Sep 23 21:37:36 2010
100 - bug fix: index() [and its friends including gmatch()] didn't remove
101 ignorable characters in the substring correctly.
102 Thanks for the bug report:
aa7758f7 103 http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2010-09/msg00014.html
1393fe00
CBW
104
105 - U::C::Locale newly supports locales: de__phonebook, nso, om, tn, vi.
584e761d 106 - added loc_de.t, loc_deph.t, loc_nso.t, loc_om.t, loc_tn.t, loc_vi.t in t.
1393fe00
CBW
107 - precomposites of a-breve, a-circ, e-circ, o-circ are tailored as well.
108 (affected locales: ro, sk, sv)
109
f1a7422f
CBW
1100.59 Sun Sep 5 17:03:52 2010
111 - U::C::Locale newly supports locales: az, fil, ha, lt, mt, tr, wo, yo.
584e761d
CBW
112 - added loc_az.t, loc_fil.t, loc_ha.t, loc_lt.t, loc_mt.t, loc_tr.t,
113 loc_wo.t, loc_yo.t in t.
f1a7422f
CBW
114 - precomposites of a-uml, o-uml, and u-uml are tailored as well.
115 (affected locales: da, et, fi, fo, is, kl, nb, nn, sk, sv)
116
6484f676
CBW
1170.58 Sun Aug 29 19:56:50 2010
118 - U::C::Locale newly supports locales: af, cy, da, fo, haw, is, kl, sw.
584e761d
CBW
119 - added loc_af.t, loc_cy.t, loc_da.t, loc_fo.t, loc_haw.t, loc_is.t,
120 loc_kl.t, loc_sw.t in t.
6484f676 121
64dc7822 1220.57 Sun Aug 22 22:39:58 2010
6484f676 123 - U::C::Locale newly supports locales: ca, et, fi, lv, sk, sl.
584e761d 124 - added loc_ca.t, loc_et.t, loc_fi.t, loc_lv.t, loc_sk.t, loc_sl.t in t.
64dc7822 125
456a1446
CBW
1260.56 Sun Aug 8 20:24:03 2010
127 - Unicode::Collate::Locale newly supports locales: eo, nb, ro, sv.
584e761d 128 - added loc_eo.t, loc_es.t, loc_estr.t, loc_nb.t, loc_ro.t, loc_sv.t in t.
456a1446 129 ! renamed t/locale_{xy}.t to t/loc_{xy}.t (for safer 8.3 names)
584e761d 130 (loc_cs.t, loc_fr.t, loc_nn.t, loc_pl.t, loc_test.t)
456a1446 131
00e00351 1320.55 Sun Aug 1 21:21:23 2010
aa7758f7
CBW
133 - incorporated Unicode::Collate::Locale with some changes. see:
134 http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2004-03/msg00030.html
456a1446 135 - supported locales: cs, es, es__traditional, fr, nn, pl.
00e00351 136 ! added t/locale*.t that uses DUCET.
584e761d 137 (locale_cs.t, locale_fr.t, locale_nn.t, locale_pl.t, locale_test.t)
b5d9a953 138 - data/*.txt and mklocale for preparation of Locale/*.pl from DUCET.
00e00351
CBW
139
1400.54 Sun Jul 25 21:37:04 2010
141 - Now UCA Revision 20 (based on Unicode 5.2.0).
142 - DUCET is also updated (for Unicode 5.2.0) as Collate/allkeys.txt,
143 which *is required* to test this module.
144 ! Please notice that allkeys.txt will be overwritten if you have had
145 other allkeys.txt already.
b5d9a953 146 - U+9FC4..U+9FCB and U+2A700..U+2B734 are new CJK unified ideographs.
00e00351
CBW
147 - Many hangul jamo are assigned (affecting hangul_terminator).
148
f58b9ef1
CBW
149 ! Now XSUB will be built by default. (XSUB needs a C compiler.)
150 To build pure perl, run disableXS before Makefile.PL.
00e00351
CBW
151 ! DUCET will be compiled when XS is used. Explicit saying
152 <table => 'allkeys.txt'> (or using another table) will prevent
1393fe00 153 this module from using the compiled DUCET.
00e00351
CBW
154
155 ! added t/default.t that uses DUCET.
156
74b94a79
CBW
1570.53 Sun Feb 14 20:46:27 2010
158 - Now UCA Revision 18 (based on Unicode 5.1.0).
00e00351 159 - DUCET is also updated (for Unicode 5.1.0) as Collate/allkeys.txt,
74b94a79
CBW
160 which is not required to test this module.
161 ! Please notice that allkeys.txt will be overwritten if you have had
162 other allkeys.txt already.
b5d9a953 163 - U+9FBC..U+9FC3 are new CJK unified ideographs.
74b94a79 164
6d24ed10
SP
1650.52 Thu Oct 13 21:51:09 2005
166 - The Unicode::Collate->new method does not destroy user's $_ any longer.
167 (thanks to Jon Warbrick for bug report)
168
0d50d293
RGS
1690.51 Sun May 29 20:21:19 2005
170 - Added the latest DUCET (for Unicode 4.1.0) as Collate/allkeys.txt,
171 which is not required to test this module.
74b94a79 172 ! Please notice that allkeys.txt will be overwritten if you have had
0d50d293
RGS
173 other allkeys.txt already.
174 - Added INSTALL section in POD.
175
3756e7ca
RGS
1760.50 Sun May 8 20:26:39 2005
177 - Now UCA Revision 14 (based on Unicode 4.1.0).
178 - Some tests are modified.
584e761d 179 - Added cjkrange.t, ignor.t, override.t in t.
3756e7ca
RGS
180 - Added META.yml.
181
1820.40 Sat Apr 24 06:54:40 2004
183 - Now a table file is searched in @INC.
184
abd1ec54
NC
1850.33 Sat Dec 13 14:07:27 2003
186 - documentation improvement: in "entry", "overrideHangul", etc.
187
1880.32 Wed Dec 3 23:38:18 2003
189 - A matching part from index(), match() etc. will include illegal
190 code points (as well as ignorable characters) following a grapheme.
191 - Contraction with illegal code point will be invalid.
584e761d
CBW
192 - Added t/view.t.
193 - Added some tests in t/illegal.t.
194 - Separated t/altern.t and t/rearrang.t from t/test.t.
abd1ec54
NC
195 - modified XSUB internals.
196
10d7ec48
NC
1970.31 Sun Nov 16 15:40:15 2003
198 - Illegal code points (surrogate and noncharacter; they are definitely
199 ignorable) will be distinguished from NULL ("\0");
200 but porting is not successful in the case of ((Pure Perl) and
201 (Perl 5.7.3 or before)). If perl 5.6.X is used, XSUB may help it
202 in place of broken CORE::unpack('U*') in older perl.
584e761d 203 - added illegal.t and illegalp.t in t.
f58b9ef1
CBW
204 - added XSUB where some functions are implemented in XSUB.
205 Pure Perl is also supported.
10d7ec48 206
91ae00cb 2070.30 Mon Oct 13 21:26:37 2003
f58b9ef1 208 - fix: Completely ignorable in table should be able to be overridden
91ae00cb
NC
209 by non-ignorable in entry.
210 - fix: Maximum length for contraction must not be shortened
10d7ec48 211 by a shorter contraction following in table and/or entry.
584e761d 212 - added t/normal.t.
91ae00cb
NC
213 - some doc fixes
214
2150.29 Mon Oct 13 12:18:23 2003
abd1ec54 216 - now UCA Version 11 (but no functionality is different from Version 9).
91ae00cb
NC
217 - supported hangul_terminator.
218 - fix: Base_Unicode_Version falsely returns Perl's Unicode version.
219 C4 in UTS #10 requires UTS's Unicode version.
220 - For variable weighting, 'variable' is recommended
221 and 'alternate' is deprecated.
222 - added version() method.
584e761d 223 - added hangtype.t, trailwt.t, variable.t, and version.t in t.
91ae00cb 224
06c8fc8f
RGS
2250.28 Sat Sep 06 20:16:01 2003
226 - Fixed another inconsistency under (normalization => undef):
227 Non-contiguous contraction is always neglected.
228 - Fixed: according to S2.1 in UTS #10, a blocked combining character
584e761d
CBW
229 should not be contracted. One test in t/test.t was wrong, then removed.
230 - Added t/contract.t.
06c8fc8f
RGS
231 - (normalization => "prenormalized") is able to be used.
232
1d2654e1
JH
2330.27 Sun Aug 31 22:23:17 2003
234 some improvements:
06c8fc8f 235 - The maximum length of contracted CE was not checked (v0.22 to v0.26).
1d2654e1
JH
236 Collation of a large string including a first letter of a contraction
237 that is not a part of that contraction (say, 'c' of 'ca'
238 where 'ch' is defined) was too slow, inefficient.
91ae00cb
NC
239 - A form name for 'normalization', no longer restricted to
240 /^(?:NF)?K?[CD]\z/, will be allowed as long as
241 Unicode::Normalize::normalize() accepts it, since Unicode::Normalize
242 or UAX #15 may be changed/enhanced in future.
1d2654e1
JH
243 - When Hangul syllables are decomposed under <normalization => undef>,
244 contraction among jamo (LV, VT, LVT) derived from the same
584e761d
CBW
245 Hangul syllable is allowed.
246 - Added t/hangul.t.
1d2654e1 247
4c843366
JH
2480.26 Sun Aug 03 22:23:17 2003
249 - fix: an expansion in which a CE is level 3 ignorable and others are not
1d2654e1 250 was wrongly made level 3 ignorable as a whole entry.
4c843366
JH
251 (In DUCET, some precomposites in Musical Symbols are so)
252
ae6aa562
JH
2530.25 Mon Jun 06 23:20:17 2003
254 - fix Makefile.PL.
255 - internal tweak (again): pack_U() and unpack_U().
45394607 256
9f1f04a1
RGS
2570.24 Thu Apr 02 23:12:54 2003
258 - internal tweak for (?un)pack 'U'.
259
4d36a948
ST
2600.23 Wed Sep 04 19:25:20 2002
261 - fix: scalar match() no longer returns an lvalue substr ref.
262 - fix: "Ignorable after variable" should be made level 3 ignorable
263 even if alternate => 'blanked'.
264 - Now a grapheme may contain trailing level 2, level 3,
265 and completely ignorable characters.
266
2670.22 Mon Sep 02 23:15:14 2002
584e761d
CBW
268 - New File: t/index.t.
269 (The new t/test.t excludes tests for index.)
4d36a948
ST
270 - tweak on index(). POSITION is supported.
271 - add match, gmatch, subst, gsubst methods.
272 - fix: ignorable after variable in 'shift'-variable weight.
273
caffd4cf
ST
2740.21 Sat Aug 03 10:24:00 2002
275 - upgrade keys.txt and t/test.t for UCA Version 9.
276
0116f5dc
JH
2770.20 Fri Jul 26 02:15:25 2002
278 - now UCA Version 9.
279 - U+FDD0..U+FDEF are new non-characters.
280 - fix: whitespace characters before @backwards etc. in a table file.
281 - now values for 'alternate', 'backwards', etc.,
282 which are explicitly specified via new(),
283 are preferred to those specified in a table file.
284
327745dc
ST
2850.12 Sun May 05 09:43:10 2002
286 - add new methods, ->UCA_Version and ->Base_Unicode_Version.
287 - test fix: removed the needless requirement of Unicode::Normalize.
288 [reported by David Hand]
289
809c7673
ST
2900.11 Fri May 03 02:28:10 2002
291 - fix: now derived collation elements can be used for Hangul Jamo
292 when their weights are not defined.
327745dc 293 [reported by Andreas J. Koenig]
809c7673
ST
294 - fix: rearrangements had not worked.
295 - mentioned pleblem on index() in BUGS.
296 - more documents, more tests.
297 - tag names for 'alternate' are case-insensitive (i.e. 'SHIFTed' etc.).
298 - The <undef> value for the keys "overrideCJK", "overrideHangul",
299 "rearrange" has a special behavior (different from default).
300
905aa9f0
ST
3010.10 Tue Dec 11 23:26:42 2001
302 - now you are allowed to use no table file.
303 - fix: fetching CE with two or more combining characters.
304
5398038e 3050.09 Sun Nov 11 17:02:40:18 2001
306 - add the following methods: eq, ne, lt, le, gt, le.
307 - relies on &Unicode::Normalize::getCombinClass()
308 in place of %Unicode::Normalize::Combin
309 (the hash is not defined in the XS version of Unicode::Normalize).
310 then you should install Unicode::Normalize 0.10 or later.
311 - now independent of Lingua::KO::Hangul::Util
312 (this module does decomposition of Hangul syllables for itself)
313
d16e9e3d
JH
3140.08 Mon Aug 20 22:40:18 2001
315 - add the index method.
316
45394607
JH
3170.07 Thu Aug 16 23:42:02 2001
318 - rename the module name to Unicode::Collate.
319
3200.06 Thu Aug 16 23:18:36 2001
321 - add description of the getSortKey method.
322
3230.05 Mon Aug 13 22:23:11 2001
324 - bug fix: on the things of 4.2.1, UTR #10
325 - getSortKey returns a string, but not an arrayref.
326
3270.04 Mon Aug 13 22:23:11 2001
328 - some bugs are fixed.
329 - some tailoring parameters are added.
330
3310.03 Mon Aug 06 06:26:35 2001
332 - modify README
333
3340.02 Sun Aug 05 20:20:01 2001
335 - some fix
336
3370.01 Sun Jul 29 16:16:15 2001
338 - original version; created by h2xs 1.21
339 with options -A -X -n Sort::UCA