This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
mktables: fix typos in comments
[perl5.git] / lib / unicore / README.perl
CommitLineData
1911be83 1The *.txt files were copied from
8836d2a5 2
99870f4d 3 ftp://www.unicode.org/Public/UNIDATA
b6922eda 4
99870f4d 5with subdirectories 'extracted' and 'auxiliary'
61131c94 6
99870f4d 7The Unihan files were not included due to space considerations. Also NOT
37e2e78e
KW
8included were any *.html files. It is possible to add the Unihan files, and
9edit mktables (see instructions near its beginning) to look at them.
99870f4d
KW
10
11The file 'version' should exist and be a single line with the Unicode version,
12like:
135.2.0
61131c94
KW
14
15To be 8.3 filesystem friendly, the names of some of the input files have been
37e2e78e
KW
16changed from the values that are in the Unicode DB. Not all of the Test files
17are currently used, so may not be present, so some of the mv's can fail. The
18.html Test files are not touched.
61131c94
KW
19
20mv PropertyValueAliases.txt PropValueAliases.txt
21mv NamedSequencesProv.txt NamedSqProv.txt
22mv DerivedAge.txt DAge.txt
23mv DerivedCoreProperties.txt DCoreProperties.txt
24mv DerivedNormalizationProps.txt DNormalizationProps.txt
25mv extracted/DerivedBidiClass.txt extracted/DBidiClass.txt
26mv extracted/DerivedBinaryProperties.txt extracted/DBinaryProperties.txt
27mv extracted/DerivedCombiningClass.txt extracted/DCombiningClass.txt
28mv extracted/DerivedDecompositionType.txt extracted/DDecompositionType.txt
29mv extracted/DerivedEastAsianWidth.txt extracted/DEastAsianWidth.txt
30mv extracted/DerivedGeneralCategory.txt extracted/DGeneralCategory.txt
31mv extracted/DerivedJoiningGroup.txt extracted/DJoinGroup.txt
32mv extracted/DerivedJoiningType.txt extracted/DJoinType.txt
33mv extracted/DerivedLineBreak.txt extracted/DLineBreak.txt
34mv extracted/DerivedNumericType.txt extracted/DNumType.txt
35mv extracted/DerivedNumericValues.txt extracted/DNumValues.txt
36
37e2e78e
KW
37mv auxiliary/GraphemeBreakTest.txt auxiliary/GCBTest.txt
38mv auxiliary/LineBreakTest.txt auxiliary/LBTest.txt
39mv auxiliary/SentenceBreakTest.txt auxiliary/SBTest.txt
40mv auxiliary/WordBreakTest.txt auxiliary/WBTest.txt
41
99870f4d
KW
42If you have the Unihan database (5.2 and above), you should also do the
43following:
61131c94 44
99870f4d
KW
45mv Unihan_DictionaryIndices.txt UnihanIndicesDictionary.txt
46mv Unihan_DictionaryLikeData.txt UnihanDataDictionaryLike.txt
47mv Unihan_IRGSources.txt UnihanIRGSources.txt
48mv Unihan_NumericValues.txt UnihanNumericValues.txt
49mv Unihan_OtherMappings.txt UnihanOtherMappings.txt
50mv Unihan_RadicalStrokeCounts.txt UnihanRadicalStrokeCounts.txt
51mv Unihan_Readings.txt UnihanReadings.txt
52mv Unihan_Variants.txt UnihanVariants.txt
53
37e2e78e
KW
54If you download everything, the names of files that are not used by mktables
55are not changed by the above, and will not work correctly as-is on 8.3
56filesystems.
99870f4d
KW
57
58mktables is used to generate the tables used by the rest of Perl. It will warn
59you about any *.txt files in the directory substructure that it doesn't know
60about. You should remove any so-identified, or edit mktables to add them to
61its lists to process. You can run
62
63 mktables -globlist
64
65to have it try to process these tables generically.
66
0fa75b59
JH
67FOR PUMPKINS
68
99870f4d
KW
69The files are inter-related. If you take the latest UnicodeData.txt, for
70example, but leave the older versions of other files, there can be subtle
37e2e78e
KW
71problems. So get everything available from Unicode, and delete those which
72aren't needed.
99870f4d
KW
73
74When moving to a new version of Unicode, you need to update 'version' by hand
75
76 p4 edit version
77 ...
78
79You should look in the Unicode release notes (which are probably towards the
80bottom of http://www.unicode.org/reports/tr44/) to see if any properties have
81newly been moved to be Obsolete, Deprecated, or Stabilized. The full names for
82these should be added to the respective lists near the beginning of mktables,
83using an 'if' to add them for just this Unicode version going forward, so that
84mktables can continue to be used for earlier Unicode versions.
85
86When putting out a new Perl release, think about if any of the Deprecated
87properties should be moved to Suppressed.
b6922eda 88
272d2fcc
KW
89perlrecharclass.pod has a list of all the characters that are white space,
90which needs to be updated if there are changes. A quick way to check if there
91have been changes would be to see if the number of such characters listed in
92perluniprops.pod (generated by running mktables) for the property
93\p{White_Space} is no longer 26. Further investigation would then be necessary
94to classify the new characters as horizontal and vertical.
95
37e2e78e
KW
96The code in regexec.c for the \X match construct is intimately tied to the
97regular expression in UAX #29 (http://www.unicode.org/reports/tr29/). You
98should see if it has changed, and if so regexec.c should be modified. The
99current one is
100( CRLF
101| Prepend* ( Hangul-syllable | !Control )
102 ( Grapheme_Extend | Spacing_Mark)*
103| . )
104
105mktables has many checks to warn you if there are unexpected or novel things
106that it doesn't know how to handle.
0fa75b59 107
37e2e78e 108Finally:
0fa75b59
JH
109
110 p4 submit
8836d2a5
JH
111
112--
99870f4d 113jhi@iki.fi; updated by nick@ccl4.org, public@khwilliamson.com