This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Updated Unicode-Collate to CPAN version 0.76
[perl5.git] / cpan / Unicode-Collate / Changes
CommitLineData
ae6aa562
JH
1Revision history for Perl module Unicode::Collate.
2
19265284
CBW
30.76 Sun May 15 10:06:59 2011
4 - updated CJK/Pinyin.pm and CJK/Stroke.pm according to CLDR 1.9.1 using
5 type='pinyin' alt='short' and type='stroke' alt='short' respectively.
6
70.75 Sat May 7 21:07:38 2011
8 - supported ignore_level2 and rewrite.
9 - Added iglevel2.t and rewrite.t in t.
10
110.74 Mon Mar 21 19:07:38 2011
12 - removed sw (Swahili) collation according to CLDR 1.9.
13 (removed files: Collate/Locale/sw.pl and data/sw.txt)
14 - shifted primary weights of letters > Z for some languages.
15 (affected locales: da, fi, fo, kl, nb, nn, sv)
16
7b98b857
CBW
170.73 Sun Mar 6 13:24:22 2011
18 - DUCET is updated (for Unicode 6.0.0) as Collate/allkeys.txt.
19 ! However no maint perl has supported Unicode 6.0.0 yet;
20 wait for 5.14, or try developing 5.13.7 or later.
21 ! Please notice that allkeys.txt will be overwritten if you have had
22 other allkeys.txt already.
23 - The default UCA_Version is 22. Locale/*.pl and Korean.pm are updated.
24 - test: compare allkeys.txt's version with Base_Unicode_Version
25 in t/default.t.
26
f58b9ef1
CBW
270.72 Sat Jan 22 17:28:32 2011
28 - xs: fix mixing char* and U8*.
29
300.71 Tue Jan 18 22:29:44 2011
31 - t/loc_test.t should not fail without Unicode::Normalize.
32
330.70 Sun Jan 16 20:31:07 2011
34 - Now U::C::Locale->new will use the compiled DUCET via XS if available.
35 added some tests in t/loc_test.t.
36
370.69 Sat Jan 15 19:41:11 2011
38 - clarified about XSUB. revised INSTALL in README.
39 - xs: flag passed to utf8n_to_uvuni().
40 - doc and comments: [perl #81876] Fix typos by Peter J. Acklam.
41
68adb2b0
CBW
420.68 Tue Nov 23 20:17:22 2010
43 - doc: clarified about (backwards => [ ]) and (backwards => undef).
44 - separated t/backwds.t from t/test.t.
45 - added cjk_b5.t, cjk_gb.t, cjk_ja.t, cjk_ko.t, cjk_py.t, cjk_st.t in t
46 for CJK/*.pm without Locale.pm.
47
b5d9a953
CBW
480.67 Sun Nov 14 11:38:59 2010
49 - supported UCA_Version 22 for Unicode 6.0.0.
50 * 2B740..2B81D are new CJK unified ideographs.
51 * noncharacters (e.g. U+FFFF) should be overridable, not be ignored.
52 ! DUCET is NOT updated, as no maint perl supports Unicode 6.0.0.
53 Thus the default UCA_Version is still 20.
54 - added t/nonchar.t.
55 - improved discontiguous contractions of 3 or more characters.
56 (e.g. 0FB2 0F71 0F80 and 0FB3 0F71 0F80)
57 - auxiliary: now 'mklocale' also copes with Korean.pm according to DUCET.
58
584e761d
CBW
590.66 Sun Nov 7 10:47:30 2010
60 - U::C::Locale newly supports locale: ko.
61 - added Unicode::Collate::CJK::Korean for ko.
62 - added t/loc_ko.t.
63 - 12 compat. ideographs (e.g. U+FA0E) are treated as unified ideographs.
64 (though DUCET also does it, now Unicode::Collate does it without DUCET.)
65 - added t/compatui.t.
f58b9ef1 66 ! Ideographs Ext.B (U+20000..U+2A6D6) can be overridden with UCA_Version 8.
584e761d
CBW
67 This is a long-standing behavior from Unicode::Collate 0.11 to 0.63.
68 A wrong fix at 0.64 should be abandoned.
69
028d3bfa
CBW
700.65 Wed Nov 3 13:10:20 2010
71 - U::C::Locale newly supports locale: zh and its some variants.
584e761d 72 (zh__big5han, zh__gb2312han, zh__pinyin, zh__stroke)
028d3bfa
CBW
73 - added Unicode::Collate::CJK::Big5 for zh__big5han.
74 - added Unicode::Collate::CJK::GB2312 for zh__gb2312han.
75 - added Unicode::Collate::CJK::Pinyin for zh__pinyin.
76 - added Unicode::Collate::CJK::Stroke for zh__stroke.
584e761d 77 - added loc_zh.t, loc_zhb5.t, loc_zhgb.t, loc_zhpy.t, loc_zhst.t in t.
028d3bfa 78
539ce3d8
CBW
790.64 Sun Oct 31 14:17:29 2010
80 - U::C::Locale newly supports locale: ja.
81 - added Unicode::Collate::CJK::JISX0208 for ja.
584e761d 82 - added loc_ja.t, loc_jait.t, loc_japr.t in t.
539ce3d8
CBW
83 - a subroutine specified in 'overrideCJK' or 'overrideHangul' is allowed
84 to return an integer or undef value.
584e761d
CBW
85 - fix: Ideographs Ext.B (U+20000..U+2A6D6) are assigned in Unicode 3.1,
86 then 'overrideCJK' should not override them with UCA_Version 8.
87 !! sorry, this fix is based on a wrong idea. reverted at 0.66. !!
88 - separated t/overcjk0.t and t/overcjk1.t from t/override.t.
539ce3d8 89
aa7758f7
CBW
900.63 Sun Oct 10 22:13:21 2010
91 - supported suppress contractions (see 'suppress' in POD).
028d3bfa 92 - internal for 'hangul_terminator' in getSortKey().
aa7758f7 93 - U::C::Locale newly supports locales: be, bg, kk, mk, ru, sr.
584e761d
CBW
94 - added loc_be.t, loc_bg.t, loc_cyrl.t, loc_kk.t, loc_mk.t, loc_ru.t,
95 loc_sr.t in t.
aa7758f7
CBW
96 - added tailoring with U+0340 or U+0341 instead of U+0300 or U+0301.
97 (affected locales: hr, is, pl, se, to, wo)
98
6709de88
CBW
990.62 Wed Oct 6 21:35:54 2010
100 - U::C::Locale newly supports locales: ar, hu, hy, se, to, uk.
584e761d 101 - added loc_ar.t, loc_hu.t, loc_hy.t, loc_se.t, loc_to.t, loc_uk.t in t.
6709de88
CBW
102 - Vietnamese (vi): added tailoring for U+0340 and U+0341.
103
c02ee425
CBW
1040.61 Sat Oct 2 11:41:29 2010
105 - U::C::Locale newly supports locales: hr, ig, sq.
584e761d 106 - added loc_hr.t, loc_ig.t, loc_sq.t in t.
c02ee425
CBW
107 - precomposites of e-dot-below, o-dot-below, o-tilde are tailored as well.
108 (affected locales: et, yo)
109 - Vietnamese (vi): added contractions for non-blocked decompositions
aa7758f7 110 * base + dot-below + mark such as a\x{323}\x{306}, \x{1EA1}\x{306} etc.
6709de88 111 * base + tone + horn such as o\x{309}\x{31B}, \x{1ECF}\x{31B} etc.
c02ee425 112
1393fe00
CBW
1130.60 Thu Sep 23 21:37:36 2010
114 - bug fix: index() [and its friends including gmatch()] didn't remove
115 ignorable characters in the substring correctly.
116 Thanks for the bug report:
aa7758f7 117 http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2010-09/msg00014.html
1393fe00
CBW
118
119 - U::C::Locale newly supports locales: de__phonebook, nso, om, tn, vi.
584e761d 120 - added loc_de.t, loc_deph.t, loc_nso.t, loc_om.t, loc_tn.t, loc_vi.t in t.
1393fe00
CBW
121 - precomposites of a-breve, a-circ, e-circ, o-circ are tailored as well.
122 (affected locales: ro, sk, sv)
123
f1a7422f
CBW
1240.59 Sun Sep 5 17:03:52 2010
125 - U::C::Locale newly supports locales: az, fil, ha, lt, mt, tr, wo, yo.
584e761d
CBW
126 - added loc_az.t, loc_fil.t, loc_ha.t, loc_lt.t, loc_mt.t, loc_tr.t,
127 loc_wo.t, loc_yo.t in t.
f1a7422f
CBW
128 - precomposites of a-uml, o-uml, and u-uml are tailored as well.
129 (affected locales: da, et, fi, fo, is, kl, nb, nn, sk, sv)
130
6484f676
CBW
1310.58 Sun Aug 29 19:56:50 2010
132 - U::C::Locale newly supports locales: af, cy, da, fo, haw, is, kl, sw.
584e761d
CBW
133 - added loc_af.t, loc_cy.t, loc_da.t, loc_fo.t, loc_haw.t, loc_is.t,
134 loc_kl.t, loc_sw.t in t.
6484f676 135
64dc7822 1360.57 Sun Aug 22 22:39:58 2010
6484f676 137 - U::C::Locale newly supports locales: ca, et, fi, lv, sk, sl.
584e761d 138 - added loc_ca.t, loc_et.t, loc_fi.t, loc_lv.t, loc_sk.t, loc_sl.t in t.
64dc7822 139
456a1446
CBW
1400.56 Sun Aug 8 20:24:03 2010
141 - Unicode::Collate::Locale newly supports locales: eo, nb, ro, sv.
584e761d 142 - added loc_eo.t, loc_es.t, loc_estr.t, loc_nb.t, loc_ro.t, loc_sv.t in t.
456a1446 143 ! renamed t/locale_{xy}.t to t/loc_{xy}.t (for safer 8.3 names)
584e761d 144 (loc_cs.t, loc_fr.t, loc_nn.t, loc_pl.t, loc_test.t)
456a1446 145
00e00351 1460.55 Sun Aug 1 21:21:23 2010
aa7758f7
CBW
147 - incorporated Unicode::Collate::Locale with some changes. see:
148 http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2004-03/msg00030.html
456a1446 149 - supported locales: cs, es, es__traditional, fr, nn, pl.
00e00351 150 ! added t/locale*.t that uses DUCET.
584e761d 151 (locale_cs.t, locale_fr.t, locale_nn.t, locale_pl.t, locale_test.t)
b5d9a953 152 - data/*.txt and mklocale for preparation of Locale/*.pl from DUCET.
00e00351
CBW
153
1540.54 Sun Jul 25 21:37:04 2010
155 - Now UCA Revision 20 (based on Unicode 5.2.0).
156 - DUCET is also updated (for Unicode 5.2.0) as Collate/allkeys.txt,
157 which *is required* to test this module.
158 ! Please notice that allkeys.txt will be overwritten if you have had
159 other allkeys.txt already.
b5d9a953 160 - U+9FC4..U+9FCB and U+2A700..U+2B734 are new CJK unified ideographs.
00e00351
CBW
161 - Many hangul jamo are assigned (affecting hangul_terminator).
162
f58b9ef1
CBW
163 ! Now XSUB will be built by default. (XSUB needs a C compiler.)
164 To build pure perl, run disableXS before Makefile.PL.
00e00351
CBW
165 ! DUCET will be compiled when XS is used. Explicit saying
166 <table => 'allkeys.txt'> (or using another table) will prevent
1393fe00 167 this module from using the compiled DUCET.
00e00351
CBW
168
169 ! added t/default.t that uses DUCET.
170
74b94a79
CBW
1710.53 Sun Feb 14 20:46:27 2010
172 - Now UCA Revision 18 (based on Unicode 5.1.0).
00e00351 173 - DUCET is also updated (for Unicode 5.1.0) as Collate/allkeys.txt,
74b94a79
CBW
174 which is not required to test this module.
175 ! Please notice that allkeys.txt will be overwritten if you have had
176 other allkeys.txt already.
b5d9a953 177 - U+9FBC..U+9FC3 are new CJK unified ideographs.
74b94a79 178
6d24ed10
SP
1790.52 Thu Oct 13 21:51:09 2005
180 - The Unicode::Collate->new method does not destroy user's $_ any longer.
181 (thanks to Jon Warbrick for bug report)
182
0d50d293
RGS
1830.51 Sun May 29 20:21:19 2005
184 - Added the latest DUCET (for Unicode 4.1.0) as Collate/allkeys.txt,
185 which is not required to test this module.
74b94a79 186 ! Please notice that allkeys.txt will be overwritten if you have had
0d50d293
RGS
187 other allkeys.txt already.
188 - Added INSTALL section in POD.
189
3756e7ca
RGS
1900.50 Sun May 8 20:26:39 2005
191 - Now UCA Revision 14 (based on Unicode 4.1.0).
192 - Some tests are modified.
584e761d 193 - Added cjkrange.t, ignor.t, override.t in t.
3756e7ca
RGS
194 - Added META.yml.
195
1960.40 Sat Apr 24 06:54:40 2004
197 - Now a table file is searched in @INC.
198
abd1ec54
NC
1990.33 Sat Dec 13 14:07:27 2003
200 - documentation improvement: in "entry", "overrideHangul", etc.
201
2020.32 Wed Dec 3 23:38:18 2003
203 - A matching part from index(), match() etc. will include illegal
204 code points (as well as ignorable characters) following a grapheme.
205 - Contraction with illegal code point will be invalid.
584e761d
CBW
206 - Added t/view.t.
207 - Added some tests in t/illegal.t.
208 - Separated t/altern.t and t/rearrang.t from t/test.t.
abd1ec54
NC
209 - modified XSUB internals.
210
10d7ec48
NC
2110.31 Sun Nov 16 15:40:15 2003
212 - Illegal code points (surrogate and noncharacter; they are definitely
213 ignorable) will be distinguished from NULL ("\0");
214 but porting is not successful in the case of ((Pure Perl) and
215 (Perl 5.7.3 or before)). If perl 5.6.X is used, XSUB may help it
216 in place of broken CORE::unpack('U*') in older perl.
584e761d 217 - added illegal.t and illegalp.t in t.
f58b9ef1
CBW
218 - added XSUB where some functions are implemented in XSUB.
219 Pure Perl is also supported.
10d7ec48 220
91ae00cb 2210.30 Mon Oct 13 21:26:37 2003
f58b9ef1 222 - fix: Completely ignorable in table should be able to be overridden
91ae00cb
NC
223 by non-ignorable in entry.
224 - fix: Maximum length for contraction must not be shortened
10d7ec48 225 by a shorter contraction following in table and/or entry.
584e761d 226 - added t/normal.t.
91ae00cb
NC
227 - some doc fixes
228
2290.29 Mon Oct 13 12:18:23 2003
abd1ec54 230 - now UCA Version 11 (but no functionality is different from Version 9).
91ae00cb
NC
231 - supported hangul_terminator.
232 - fix: Base_Unicode_Version falsely returns Perl's Unicode version.
233 C4 in UTS #10 requires UTS's Unicode version.
234 - For variable weighting, 'variable' is recommended
235 and 'alternate' is deprecated.
236 - added version() method.
584e761d 237 - added hangtype.t, trailwt.t, variable.t, and version.t in t.
91ae00cb 238
06c8fc8f
RGS
2390.28 Sat Sep 06 20:16:01 2003
240 - Fixed another inconsistency under (normalization => undef):
241 Non-contiguous contraction is always neglected.
242 - Fixed: according to S2.1 in UTS #10, a blocked combining character
584e761d
CBW
243 should not be contracted. One test in t/test.t was wrong, then removed.
244 - Added t/contract.t.
06c8fc8f
RGS
245 - (normalization => "prenormalized") is able to be used.
246
1d2654e1
JH
2470.27 Sun Aug 31 22:23:17 2003
248 some improvements:
06c8fc8f 249 - The maximum length of contracted CE was not checked (v0.22 to v0.26).
1d2654e1
JH
250 Collation of a large string including a first letter of a contraction
251 that is not a part of that contraction (say, 'c' of 'ca'
252 where 'ch' is defined) was too slow, inefficient.
91ae00cb
NC
253 - A form name for 'normalization', no longer restricted to
254 /^(?:NF)?K?[CD]\z/, will be allowed as long as
255 Unicode::Normalize::normalize() accepts it, since Unicode::Normalize
256 or UAX #15 may be changed/enhanced in future.
1d2654e1
JH
257 - When Hangul syllables are decomposed under <normalization => undef>,
258 contraction among jamo (LV, VT, LVT) derived from the same
584e761d
CBW
259 Hangul syllable is allowed.
260 - Added t/hangul.t.
1d2654e1 261
4c843366
JH
2620.26 Sun Aug 03 22:23:17 2003
263 - fix: an expansion in which a CE is level 3 ignorable and others are not
1d2654e1 264 was wrongly made level 3 ignorable as a whole entry.
4c843366
JH
265 (In DUCET, some precomposites in Musical Symbols are so)
266
ae6aa562
JH
2670.25 Mon Jun 06 23:20:17 2003
268 - fix Makefile.PL.
269 - internal tweak (again): pack_U() and unpack_U().
45394607 270
9f1f04a1
RGS
2710.24 Thu Apr 02 23:12:54 2003
272 - internal tweak for (?un)pack 'U'.
273
4d36a948
ST
2740.23 Wed Sep 04 19:25:20 2002
275 - fix: scalar match() no longer returns an lvalue substr ref.
276 - fix: "Ignorable after variable" should be made level 3 ignorable
277 even if alternate => 'blanked'.
278 - Now a grapheme may contain trailing level 2, level 3,
279 and completely ignorable characters.
280
2810.22 Mon Sep 02 23:15:14 2002
584e761d
CBW
282 - New File: t/index.t.
283 (The new t/test.t excludes tests for index.)
4d36a948
ST
284 - tweak on index(). POSITION is supported.
285 - add match, gmatch, subst, gsubst methods.
286 - fix: ignorable after variable in 'shift'-variable weight.
287
caffd4cf
ST
2880.21 Sat Aug 03 10:24:00 2002
289 - upgrade keys.txt and t/test.t for UCA Version 9.
290
0116f5dc
JH
2910.20 Fri Jul 26 02:15:25 2002
292 - now UCA Version 9.
293 - U+FDD0..U+FDEF are new non-characters.
294 - fix: whitespace characters before @backwards etc. in a table file.
295 - now values for 'alternate', 'backwards', etc.,
296 which are explicitly specified via new(),
297 are preferred to those specified in a table file.
298
327745dc
ST
2990.12 Sun May 05 09:43:10 2002
300 - add new methods, ->UCA_Version and ->Base_Unicode_Version.
301 - test fix: removed the needless requirement of Unicode::Normalize.
302 [reported by David Hand]
303
809c7673
ST
3040.11 Fri May 03 02:28:10 2002
305 - fix: now derived collation elements can be used for Hangul Jamo
306 when their weights are not defined.
327745dc 307 [reported by Andreas J. Koenig]
809c7673
ST
308 - fix: rearrangements had not worked.
309 - mentioned pleblem on index() in BUGS.
310 - more documents, more tests.
311 - tag names for 'alternate' are case-insensitive (i.e. 'SHIFTed' etc.).
312 - The <undef> value for the keys "overrideCJK", "overrideHangul",
313 "rearrange" has a special behavior (different from default).
314
905aa9f0
ST
3150.10 Tue Dec 11 23:26:42 2001
316 - now you are allowed to use no table file.
317 - fix: fetching CE with two or more combining characters.
318
5398038e 3190.09 Sun Nov 11 17:02:40:18 2001
320 - add the following methods: eq, ne, lt, le, gt, le.
321 - relies on &Unicode::Normalize::getCombinClass()
322 in place of %Unicode::Normalize::Combin
323 (the hash is not defined in the XS version of Unicode::Normalize).
324 then you should install Unicode::Normalize 0.10 or later.
325 - now independent of Lingua::KO::Hangul::Util
326 (this module does decomposition of Hangul syllables for itself)
327
d16e9e3d
JH
3280.08 Mon Aug 20 22:40:18 2001
329 - add the index method.
330
45394607
JH
3310.07 Thu Aug 16 23:42:02 2001
332 - rename the module name to Unicode::Collate.
333
3340.06 Thu Aug 16 23:18:36 2001
335 - add description of the getSortKey method.
336
3370.05 Mon Aug 13 22:23:11 2001
338 - bug fix: on the things of 4.2.1, UTR #10
339 - getSortKey returns a string, but not an arrayref.
340
3410.04 Mon Aug 13 22:23:11 2001
342 - some bugs are fixed.
343 - some tailoring parameters are added.
344
3450.03 Mon Aug 06 06:26:35 2001
346 - modify README
347
3480.02 Sun Aug 05 20:20:01 2001
349 - some fix
350
3510.01 Sun Jul 29 16:16:15 2001
352 - original version; created by h2xs 1.21
353 with options -A -X -n Sort::UCA