This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Update Unicode-Collate to CPAN version 0.71
[perl5.git] / cpan / Unicode-Collate / Changes
CommitLineData
ae6aa562
JH
1Revision history for Perl module Unicode::Collate.
2
cac3df65
CBW
30.71 Tue Jan 18 22:29:44 2011
4 - t/loc_test.t should not fail without Unicode::Normalize.
5
211cc501 60.70 Sun Jan 16 20:31:07 2011
cac3df65
CBW
7 - Now U::C::Locale->new will use the compiled DUCET via XS if available.
8 added some tests in t/loc_test.t.
211cc501
CBW
9
100.69 Sat Jan 15 19:41:11 2011
11 - clarified about XSUB. revised INSTALL in README.
12 - xs: flag passed to utf8n_to_uvuni().
13 - doc and comments: [perl #81876] Fix typos by Peter J. Acklam.
14
68adb2b0
CBW
150.68 Tue Nov 23 20:17:22 2010
16 - doc: clarified about (backwards => [ ]) and (backwards => undef).
17 - separated t/backwds.t from t/test.t.
18 - added cjk_b5.t, cjk_gb.t, cjk_ja.t, cjk_ko.t, cjk_py.t, cjk_st.t in t
19 for CJK/*.pm without Locale.pm.
20
b5d9a953
CBW
210.67 Sun Nov 14 11:38:59 2010
22 - supported UCA_Version 22 for Unicode 6.0.0.
23 * 2B740..2B81D are new CJK unified ideographs.
24 * noncharacters (e.g. U+FFFF) should be overridable, not be ignored.
25 ! DUCET is NOT updated, as no maint perl supports Unicode 6.0.0.
26 Thus the default UCA_Version is still 20.
27 - added t/nonchar.t.
28 - improved discontiguous contractions of 3 or more characters.
29 (e.g. 0FB2 0F71 0F80 and 0FB3 0F71 0F80)
30 - auxiliary: now 'mklocale' also copes with Korean.pm according to DUCET.
31
584e761d
CBW
320.66 Sun Nov 7 10:47:30 2010
33 - U::C::Locale newly supports locale: ko.
34 - added Unicode::Collate::CJK::Korean for ko.
35 - added t/loc_ko.t.
36 - 12 compat. ideographs (e.g. U+FA0E) are treated as unified ideographs.
37 (though DUCET also does it, now Unicode::Collate does it without DUCET.)
38 - added t/compatui.t.
211cc501 39 ! Ideographs Ext.B (U+20000..U+2A6D6) can be overridden with UCA_Version 8.
584e761d
CBW
40 This is a long-standing behavior from Unicode::Collate 0.11 to 0.63.
41 A wrong fix at 0.64 should be abandoned.
42
028d3bfa
CBW
430.65 Wed Nov 3 13:10:20 2010
44 - U::C::Locale newly supports locale: zh and its some variants.
584e761d 45 (zh__big5han, zh__gb2312han, zh__pinyin, zh__stroke)
028d3bfa
CBW
46 - added Unicode::Collate::CJK::Big5 for zh__big5han.
47 - added Unicode::Collate::CJK::GB2312 for zh__gb2312han.
48 - added Unicode::Collate::CJK::Pinyin for zh__pinyin.
49 - added Unicode::Collate::CJK::Stroke for zh__stroke.
584e761d 50 - added loc_zh.t, loc_zhb5.t, loc_zhgb.t, loc_zhpy.t, loc_zhst.t in t.
028d3bfa 51
539ce3d8
CBW
520.64 Sun Oct 31 14:17:29 2010
53 - U::C::Locale newly supports locale: ja.
54 - added Unicode::Collate::CJK::JISX0208 for ja.
584e761d 55 - added loc_ja.t, loc_jait.t, loc_japr.t in t.
539ce3d8
CBW
56 - a subroutine specified in 'overrideCJK' or 'overrideHangul' is allowed
57 to return an integer or undef value.
584e761d
CBW
58 - fix: Ideographs Ext.B (U+20000..U+2A6D6) are assigned in Unicode 3.1,
59 then 'overrideCJK' should not override them with UCA_Version 8.
60 !! sorry, this fix is based on a wrong idea. reverted at 0.66. !!
61 - separated t/overcjk0.t and t/overcjk1.t from t/override.t.
539ce3d8 62
aa7758f7
CBW
630.63 Sun Oct 10 22:13:21 2010
64 - supported suppress contractions (see 'suppress' in POD).
028d3bfa 65 - internal for 'hangul_terminator' in getSortKey().
aa7758f7 66 - U::C::Locale newly supports locales: be, bg, kk, mk, ru, sr.
584e761d
CBW
67 - added loc_be.t, loc_bg.t, loc_cyrl.t, loc_kk.t, loc_mk.t, loc_ru.t,
68 loc_sr.t in t.
aa7758f7
CBW
69 - added tailoring with U+0340 or U+0341 instead of U+0300 or U+0301.
70 (affected locales: hr, is, pl, se, to, wo)
71
6709de88
CBW
720.62 Wed Oct 6 21:35:54 2010
73 - U::C::Locale newly supports locales: ar, hu, hy, se, to, uk.
584e761d 74 - added loc_ar.t, loc_hu.t, loc_hy.t, loc_se.t, loc_to.t, loc_uk.t in t.
6709de88
CBW
75 - Vietnamese (vi): added tailoring for U+0340 and U+0341.
76
c02ee425
CBW
770.61 Sat Oct 2 11:41:29 2010
78 - U::C::Locale newly supports locales: hr, ig, sq.
584e761d 79 - added loc_hr.t, loc_ig.t, loc_sq.t in t.
c02ee425
CBW
80 - precomposites of e-dot-below, o-dot-below, o-tilde are tailored as well.
81 (affected locales: et, yo)
82 - Vietnamese (vi): added contractions for non-blocked decompositions
aa7758f7 83 * base + dot-below + mark such as a\x{323}\x{306}, \x{1EA1}\x{306} etc.
6709de88 84 * base + tone + horn such as o\x{309}\x{31B}, \x{1ECF}\x{31B} etc.
c02ee425 85
1393fe00
CBW
860.60 Thu Sep 23 21:37:36 2010
87 - bug fix: index() [and its friends including gmatch()] didn't remove
88 ignorable characters in the substring correctly.
89 Thanks for the bug report:
aa7758f7 90 http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2010-09/msg00014.html
1393fe00
CBW
91
92 - U::C::Locale newly supports locales: de__phonebook, nso, om, tn, vi.
584e761d 93 - added loc_de.t, loc_deph.t, loc_nso.t, loc_om.t, loc_tn.t, loc_vi.t in t.
1393fe00
CBW
94 - precomposites of a-breve, a-circ, e-circ, o-circ are tailored as well.
95 (affected locales: ro, sk, sv)
96
f1a7422f
CBW
970.59 Sun Sep 5 17:03:52 2010
98 - U::C::Locale newly supports locales: az, fil, ha, lt, mt, tr, wo, yo.
584e761d
CBW
99 - added loc_az.t, loc_fil.t, loc_ha.t, loc_lt.t, loc_mt.t, loc_tr.t,
100 loc_wo.t, loc_yo.t in t.
f1a7422f
CBW
101 - precomposites of a-uml, o-uml, and u-uml are tailored as well.
102 (affected locales: da, et, fi, fo, is, kl, nb, nn, sk, sv)
103
6484f676
CBW
1040.58 Sun Aug 29 19:56:50 2010
105 - U::C::Locale newly supports locales: af, cy, da, fo, haw, is, kl, sw.
584e761d
CBW
106 - added loc_af.t, loc_cy.t, loc_da.t, loc_fo.t, loc_haw.t, loc_is.t,
107 loc_kl.t, loc_sw.t in t.
6484f676 108
64dc7822 1090.57 Sun Aug 22 22:39:58 2010
6484f676 110 - U::C::Locale newly supports locales: ca, et, fi, lv, sk, sl.
584e761d 111 - added loc_ca.t, loc_et.t, loc_fi.t, loc_lv.t, loc_sk.t, loc_sl.t in t.
64dc7822 112
456a1446
CBW
1130.56 Sun Aug 8 20:24:03 2010
114 - Unicode::Collate::Locale newly supports locales: eo, nb, ro, sv.
584e761d 115 - added loc_eo.t, loc_es.t, loc_estr.t, loc_nb.t, loc_ro.t, loc_sv.t in t.
456a1446 116 ! renamed t/locale_{xy}.t to t/loc_{xy}.t (for safer 8.3 names)
584e761d 117 (loc_cs.t, loc_fr.t, loc_nn.t, loc_pl.t, loc_test.t)
456a1446 118
00e00351 1190.55 Sun Aug 1 21:21:23 2010
aa7758f7
CBW
120 - incorporated Unicode::Collate::Locale with some changes. see:
121 http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2004-03/msg00030.html
456a1446 122 - supported locales: cs, es, es__traditional, fr, nn, pl.
00e00351 123 ! added t/locale*.t that uses DUCET.
584e761d 124 (locale_cs.t, locale_fr.t, locale_nn.t, locale_pl.t, locale_test.t)
b5d9a953 125 - data/*.txt and mklocale for preparation of Locale/*.pl from DUCET.
00e00351
CBW
126
1270.54 Sun Jul 25 21:37:04 2010
128 - Now UCA Revision 20 (based on Unicode 5.2.0).
129 - DUCET is also updated (for Unicode 5.2.0) as Collate/allkeys.txt,
130 which *is required* to test this module.
131 ! Please notice that allkeys.txt will be overwritten if you have had
132 other allkeys.txt already.
b5d9a953 133 - U+9FC4..U+9FCB and U+2A700..U+2B734 are new CJK unified ideographs.
00e00351
CBW
134 - Many hangul jamo are assigned (affecting hangul_terminator).
135
211cc501
CBW
136 ! Now XSUB will be built by default. (XSUB needs a C compiler.)
137 To build pure perl, run disableXS before Makefile.PL.
00e00351
CBW
138 ! DUCET will be compiled when XS is used. Explicit saying
139 <table => 'allkeys.txt'> (or using another table) will prevent
1393fe00 140 this module from using the compiled DUCET.
00e00351
CBW
141
142 ! added t/default.t that uses DUCET.
143
74b94a79
CBW
1440.53 Sun Feb 14 20:46:27 2010
145 - Now UCA Revision 18 (based on Unicode 5.1.0).
00e00351 146 - DUCET is also updated (for Unicode 5.1.0) as Collate/allkeys.txt,
74b94a79
CBW
147 which is not required to test this module.
148 ! Please notice that allkeys.txt will be overwritten if you have had
149 other allkeys.txt already.
b5d9a953 150 - U+9FBC..U+9FC3 are new CJK unified ideographs.
74b94a79 151
6d24ed10
SP
1520.52 Thu Oct 13 21:51:09 2005
153 - The Unicode::Collate->new method does not destroy user's $_ any longer.
154 (thanks to Jon Warbrick for bug report)
155
0d50d293
RGS
1560.51 Sun May 29 20:21:19 2005
157 - Added the latest DUCET (for Unicode 4.1.0) as Collate/allkeys.txt,
158 which is not required to test this module.
74b94a79 159 ! Please notice that allkeys.txt will be overwritten if you have had
0d50d293
RGS
160 other allkeys.txt already.
161 - Added INSTALL section in POD.
162
3756e7ca
RGS
1630.50 Sun May 8 20:26:39 2005
164 - Now UCA Revision 14 (based on Unicode 4.1.0).
165 - Some tests are modified.
584e761d 166 - Added cjkrange.t, ignor.t, override.t in t.
3756e7ca
RGS
167 - Added META.yml.
168
1690.40 Sat Apr 24 06:54:40 2004
170 - Now a table file is searched in @INC.
171
abd1ec54
NC
1720.33 Sat Dec 13 14:07:27 2003
173 - documentation improvement: in "entry", "overrideHangul", etc.
174
1750.32 Wed Dec 3 23:38:18 2003
176 - A matching part from index(), match() etc. will include illegal
177 code points (as well as ignorable characters) following a grapheme.
178 - Contraction with illegal code point will be invalid.
584e761d
CBW
179 - Added t/view.t.
180 - Added some tests in t/illegal.t.
181 - Separated t/altern.t and t/rearrang.t from t/test.t.
abd1ec54
NC
182 - modified XSUB internals.
183
10d7ec48
NC
1840.31 Sun Nov 16 15:40:15 2003
185 - Illegal code points (surrogate and noncharacter; they are definitely
186 ignorable) will be distinguished from NULL ("\0");
187 but porting is not successful in the case of ((Pure Perl) and
188 (Perl 5.7.3 or before)). If perl 5.6.X is used, XSUB may help it
189 in place of broken CORE::unpack('U*') in older perl.
584e761d 190 - added illegal.t and illegalp.t in t.
211cc501
CBW
191 - added XSUB where some functions are implemented in XSUB.
192 Pure Perl is also supported.
10d7ec48 193
91ae00cb 1940.30 Mon Oct 13 21:26:37 2003
211cc501 195 - fix: Completely ignorable in table should be able to be overridden
91ae00cb
NC
196 by non-ignorable in entry.
197 - fix: Maximum length for contraction must not be shortened
10d7ec48 198 by a shorter contraction following in table and/or entry.
584e761d 199 - added t/normal.t.
91ae00cb
NC
200 - some doc fixes
201
2020.29 Mon Oct 13 12:18:23 2003
abd1ec54 203 - now UCA Version 11 (but no functionality is different from Version 9).
91ae00cb
NC
204 - supported hangul_terminator.
205 - fix: Base_Unicode_Version falsely returns Perl's Unicode version.
206 C4 in UTS #10 requires UTS's Unicode version.
207 - For variable weighting, 'variable' is recommended
208 and 'alternate' is deprecated.
209 - added version() method.
584e761d 210 - added hangtype.t, trailwt.t, variable.t, and version.t in t.
91ae00cb 211
06c8fc8f
RGS
2120.28 Sat Sep 06 20:16:01 2003
213 - Fixed another inconsistency under (normalization => undef):
214 Non-contiguous contraction is always neglected.
215 - Fixed: according to S2.1 in UTS #10, a blocked combining character
584e761d
CBW
216 should not be contracted. One test in t/test.t was wrong, then removed.
217 - Added t/contract.t.
06c8fc8f
RGS
218 - (normalization => "prenormalized") is able to be used.
219
1d2654e1
JH
2200.27 Sun Aug 31 22:23:17 2003
221 some improvements:
06c8fc8f 222 - The maximum length of contracted CE was not checked (v0.22 to v0.26).
1d2654e1
JH
223 Collation of a large string including a first letter of a contraction
224 that is not a part of that contraction (say, 'c' of 'ca'
225 where 'ch' is defined) was too slow, inefficient.
91ae00cb
NC
226 - A form name for 'normalization', no longer restricted to
227 /^(?:NF)?K?[CD]\z/, will be allowed as long as
228 Unicode::Normalize::normalize() accepts it, since Unicode::Normalize
229 or UAX #15 may be changed/enhanced in future.
1d2654e1
JH
230 - When Hangul syllables are decomposed under <normalization => undef>,
231 contraction among jamo (LV, VT, LVT) derived from the same
584e761d
CBW
232 Hangul syllable is allowed.
233 - Added t/hangul.t.
1d2654e1 234
4c843366
JH
2350.26 Sun Aug 03 22:23:17 2003
236 - fix: an expansion in which a CE is level 3 ignorable and others are not
1d2654e1 237 was wrongly made level 3 ignorable as a whole entry.
4c843366
JH
238 (In DUCET, some precomposites in Musical Symbols are so)
239
ae6aa562
JH
2400.25 Mon Jun 06 23:20:17 2003
241 - fix Makefile.PL.
242 - internal tweak (again): pack_U() and unpack_U().
45394607 243
9f1f04a1
RGS
2440.24 Thu Apr 02 23:12:54 2003
245 - internal tweak for (?un)pack 'U'.
246
4d36a948
ST
2470.23 Wed Sep 04 19:25:20 2002
248 - fix: scalar match() no longer returns an lvalue substr ref.
249 - fix: "Ignorable after variable" should be made level 3 ignorable
250 even if alternate => 'blanked'.
251 - Now a grapheme may contain trailing level 2, level 3,
252 and completely ignorable characters.
253
2540.22 Mon Sep 02 23:15:14 2002
584e761d
CBW
255 - New File: t/index.t.
256 (The new t/test.t excludes tests for index.)
4d36a948
ST
257 - tweak on index(). POSITION is supported.
258 - add match, gmatch, subst, gsubst methods.
259 - fix: ignorable after variable in 'shift'-variable weight.
260
caffd4cf
ST
2610.21 Sat Aug 03 10:24:00 2002
262 - upgrade keys.txt and t/test.t for UCA Version 9.
263
0116f5dc
JH
2640.20 Fri Jul 26 02:15:25 2002
265 - now UCA Version 9.
266 - U+FDD0..U+FDEF are new non-characters.
267 - fix: whitespace characters before @backwards etc. in a table file.
268 - now values for 'alternate', 'backwards', etc.,
269 which are explicitly specified via new(),
270 are preferred to those specified in a table file.
271
327745dc
ST
2720.12 Sun May 05 09:43:10 2002
273 - add new methods, ->UCA_Version and ->Base_Unicode_Version.
274 - test fix: removed the needless requirement of Unicode::Normalize.
275 [reported by David Hand]
276
809c7673
ST
2770.11 Fri May 03 02:28:10 2002
278 - fix: now derived collation elements can be used for Hangul Jamo
279 when their weights are not defined.
327745dc 280 [reported by Andreas J. Koenig]
809c7673
ST
281 - fix: rearrangements had not worked.
282 - mentioned pleblem on index() in BUGS.
283 - more documents, more tests.
284 - tag names for 'alternate' are case-insensitive (i.e. 'SHIFTed' etc.).
285 - The <undef> value for the keys "overrideCJK", "overrideHangul",
286 "rearrange" has a special behavior (different from default).
287
905aa9f0
ST
2880.10 Tue Dec 11 23:26:42 2001
289 - now you are allowed to use no table file.
290 - fix: fetching CE with two or more combining characters.
291
5398038e 2920.09 Sun Nov 11 17:02:40:18 2001
293 - add the following methods: eq, ne, lt, le, gt, le.
294 - relies on &Unicode::Normalize::getCombinClass()
295 in place of %Unicode::Normalize::Combin
296 (the hash is not defined in the XS version of Unicode::Normalize).
297 then you should install Unicode::Normalize 0.10 or later.
298 - now independent of Lingua::KO::Hangul::Util
299 (this module does decomposition of Hangul syllables for itself)
300
d16e9e3d
JH
3010.08 Mon Aug 20 22:40:18 2001
302 - add the index method.
303
45394607
JH
3040.07 Thu Aug 16 23:42:02 2001
305 - rename the module name to Unicode::Collate.
306
3070.06 Thu Aug 16 23:18:36 2001
308 - add description of the getSortKey method.
309
3100.05 Mon Aug 13 22:23:11 2001
311 - bug fix: on the things of 4.2.1, UTR #10
312 - getSortKey returns a string, but not an arrayref.
313
3140.04 Mon Aug 13 22:23:11 2001
315 - some bugs are fixed.
316 - some tailoring parameters are added.
317
3180.03 Mon Aug 06 06:26:35 2001
319 - modify README
320
3210.02 Sun Aug 05 20:20:01 2001
322 - some fix
323
3240.01 Sun Jul 29 16:16:15 2001
325 - original version; created by h2xs 1.21
326 with options -A -X -n Sort::UCA