This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Update to Unicode 3.1.1.
[perl5.git] / lib / unicore / PropList.html
CommitLineData
06bfd75b
JH
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
2<html>
3
4<head>
5<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
6<meta http-equiv="Content-Language" content="en-us">
7<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
8<meta name="ProgId" content="FrontPage.Editor.Document">
9<meta name="keywords"
10content="unicode, normalization, composition, decomposition">
11<meta name="description" content="Describes PropList.html">
12<title>UCD: Extended Character Properties</title>
13<link rel="stylesheet" type="text/css" href="http://www.unicode.org/unicode.css">
14</head>
15
16<body bgcolor="#ffffff">
17
18<table width="100%" cellpadding="0" cellspacing="0" border="0">
19 <tr>
20 <td>
21 <table width="100%" border="0" cellpadding="0" cellspacing="0">
22 <tr>
23 <td class="icon"><a href="http://www.unicode.org"><img border="0"
24 src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle"
25 alt="[Unicode]" width="34" height="33"></a>&nbsp;&nbsp;<a
26 class="bar" href="UnicodeCharacterDatabase.html">Unicode Character
27 Database</a></td>
28 </tr>
29 </table>
30 </td>
31 </tr>
32 <tr>
33 <td class="gray">&nbsp;</td>
34 </tr>
35</table>
36<h1>Extended Character Properties</h1>
37<table height="87" cellspacing="2" cellpadding="0" width="100%" border="1">
38 <tbody>
39 <tr>
40 <td valign="top" width="144">Revision</td>
afc46004 41 <td valign="top">3.1.1</td>
06bfd75b
JH
42 </tr>
43 <tr>
44 <td valign="top" width="144">Authors</td>
45 <td valign="top">Mark Davis</td>
46 </tr>
47 <tr>
48 <td valign="top" width="144">Date</td>
afc46004 49 <td valign="top">2001-07-12</td>
06bfd75b
JH
50 </tr>
51 <tr>
52 <td valign="top" width="144">This Version</td>
53 <td valign="top"><a
afc46004 54 href="http://www.unicode.org/Public/3.1-Update1/PropList-3.1.1.html">http://www.unicode.org/Public/3.1-Update1/PropList-3.1.1.html</a></td>
06bfd75b
JH
55 </tr>
56 <tr>
57 <td valign="top" width="144">Previous Version</td>
58 <td valign="top">n/a</td>
59 </tr>
60 <tr>
61 <td valign="top" width="144">Latest Version</td>
62 <td valign="top"><a
63 href="http://www.unicode.org/Public/UNIDATA/PropList.html">http://www.unicode.org/Public/UNIDATA/PropList.html</a></td>
64 </tr>
65 </tbody>
66</table>
67<h3><i><br>
68Summary</i></h3>
69<blockquote>
70 <p><i>This document describes the format and content of the PropList.txt data
71 file in the Unicode Character Database (UCD).</i></p>
72</blockquote>
73<h3><i>Status</i></h3>
74<blockquote>
75 <p><i>The file and the files described herein are part of the Unicode
76 Character Database and governed by the <a href="#UCD_Terms">UCD Terms of Use</a>
77 given below.</i></p>
78 <p><i>For general information on file formats and table formats, and the
79 implications of normative vs informative properties, see
80 UnicodeCharacterDatabase.html.</i></p>
81 <p><i><b>Warning: </b>the information in this file does not completely
82 describe the use and interpretation of Unicode character properties and
83 behavior. It must be used in conjunction with the data in the other files in
84 the UCD, and relies on the notation and definitions supplied in <a
85 href="http://www.unicode.org/unicode/standard/versions/Unicode3.0.html">The
86 Unicode Standard</a>. All chapter references are to Version 3.1.0 of the
87 standard.</i></p>
88</blockquote>
89<hr width="50%">
90<h2>Introduction</h2>
91<p align="left">PropList.txt contains extended properties that supplement the
92General Category property described in UnicodeData.html. Unlike the derived
93properties, the properties in PropList.txt cannot be derived directly from
94UnicodeData.txt or other data files of the UCD. These properties are listed in
95the following table.</p>
96<div align="center">
97 <center>
98 <table border="1" cellspacing="0" cellpadding="3" class="smallText">
99 <tr>
100 <th>Property Value</th>
101 <th>N/I</th>
102 <th>Definition and Usage</th>
103 </tr>
104 <tr>
105 <th valign="top">White_space</th>
106 <th valign="top">N</th>
107 <td valign="top">Space characters and those format control characters
108 (such as TAB, CR and LF) which should be treated by programming
109 languages as &quot;white space&quot; for the purpose of parsing
110 elements.
111 <p><b>Note:</b> ZERO WIDTH SPACE and ZERO WIDTH NO-BREAK SPACE are not
112 included, since their functions are restricted to line-break control.
113 Their names are unfortunately misleading in this respect.</p>
114 <p><b>Note: </b>There are other senses of &quot;whitespace&quot; that
115 encompass a different set of characters.</p>
116 </td>
117 </tr>
118 <tr>
119 <th valign="top">Bidi_Control</th>
120 <th valign="top">N</th>
121 <td valign="top">Those format control characters which have specific
122 functions in the Bidirectional Algorithm.</td>
123 </tr>
124 <tr>
125 <th valign="top">Join_Control</th>
126 <th valign="top">N</th>
127 <td valign="top">Those format control characters which have specific
128 functions for control of cursive joining and ligation.</td>
129 </tr>
130 <tr>
afc46004
JH
131 <th valign="top">ASCII_Hex_Digit</th>
132 <th valign="top">N</th>
133 <td valign="top">ASCII characters commonly used for the representation of
134 hexadecimal numbers.</td>
135 </tr>
136 <tr>
06bfd75b
JH
137 <th valign="top">Dash</th>
138 <th valign="top">I</th>
139 <td valign="top">Those punctuation characters explicitly called out as
140 dashes in the Unicode Standard, plus compatibility equivalents to those.
141 Most of these have the Pd General Category, but some have the Sm General
142 Category because of their use in mathematics.</td>
143 </tr>
144 <tr>
145 <th valign="top">Hyphen</th>
146 <th valign="top">I</th>
147 <td valign="top">Those dashes used to mark connections between pieces of
148 words, plus the Katakana middle dot. The Katakana middle dot functions
149 like a hyphen, but is shaped like a dot rather than a dash.</td>
150 </tr>
151 <tr>
152 <th valign="top">Quotation_Mark</th>
153 <th valign="top">I</th>
154 <td valign="top">Those punctuation characters that function as quotation
155 marks.</td>
156 </tr>
157 <tr>
158 <th valign="top">Terminal_Punctuation</th>
159 <th valign="top">I</th>
160 <td valign="top">Those punctuation characters that generally mark the end
161 of textual units.</td>
162 </tr>
163 <tr>
164 <th valign="top">Other_Math</th>
165 <th valign="top">I</th>
166 <td valign="top">Math characters that do not have the Sm General Category.</td>
167 </tr>
168 <tr>
169 <th valign="top">Hex_Digit</th>
170 <th valign="top">I</th>
171 <td valign="top">Characters commonly used for the representation of
172 hexadecimal numbers, plus their compatibility equivalents.</td>
173 </tr>
174 <tr>
175 <th valign="top">Other_Alphabetic</th>
176 <th valign="top">I</th>
177 <td valign="top">Alphabetic characters that do not have L as their major
178 class for the General Category (Lu, Ll, Lt, Lm, Lo).</td>
179 </tr>
180 <tr>
181 <th valign="top">Ideographic</th>
182 <th valign="top">I</th>
183 <td valign="top">Characters considered to be CJKV (Chinese, Japanese,
184 Korean, and Vietnamese) ideographs.</td>
185 </tr>
186 <tr>
187 <th valign="top">Diacritic</th>
188 <th valign="top">I</th>
189 <td valign="top">Characters that linguistically modify the meaning of
190 another character to which they apply. Some diacritics are not combining
191 characters, and some combining characters are not diacritics.</td>
192 </tr>
193 <tr>
194 <th valign="top">Extender</th>
195 <th valign="top">I</th>
196 <td valign="top">Characters whose principal function is to extend the
197 value or shape of a preceding alphabetic character. Typical of these are
198 length and iteration marks.</td>
199 </tr>
200 <tr>
201 <th valign="top">Other_Lowercase</th>
202 <th valign="top">I</th>
203 <td valign="top">Lowercase characters that do not have the Ll General
204 Category.</td>
205 </tr>
206 <tr>
207 <th valign="top">Other_Uppercase</th>
208 <th valign="top">I</th>
209 <td valign="top">Uppercase characters that do not have the Lu General
210 Category.</td>
211 </tr>
212 <tr>
213 <th valign="top">Noncharacter_Code_Point</th>
214 <th valign="top">N</th>
215 <td valign="top">Code points that are explicitly defined as illegal for
216 the encoding of characters. See <a
217 href="http://www.unicode.org/unicode/reports/tr27/">Unicode 3.1</a> for
218 more information.</td>
219 </tr>
220 </table>
221 </center>
222</div>
223<h2><i><a name="UCD_Terms"><br>
224UCD Terms of Use</a></i></h2>
225<h3><i>Disclaimer</i></h3>
226<blockquote>
227 <p><i>The Unicode Character Database is provided as is by Unicode, Inc. No
228 claims are made as to fitness for any particular purpose. No warranties of any
229 kind are expressed or implied. The recipient agrees to determine applicability
230 of information provided. If this file has been purchased on magnetic or
231 optical media from Unicode, Inc., the sole remedy for any claim will be
232 exchange of defective media within 90 days of receipt.</i></p>
233 <p><i>This disclaimer is applicable for all other data files accompanying the
234 Unicode Character Database, some of which have been compiled by the Unicode
235 Consortium, and some of which have been supplied by other sources.</i></p>
236</blockquote>
237<h3><i>Limitations on Rights to Redistribute This Data</i></h3>
238<blockquote>
239 <p><i>Recipient is granted the right to make copies in any form for internal
240 distribution and to freely use the information supplied in the creation of
241 products supporting the Unicode<sup>TM</sup> Standard. The files in the
242 Unicode Character Database can be redistributed to third parties or other
243 organizations (whether for profit or not) as long as this notice and the
244 disclaimer notice are retained. Information can be extracted from these files
245 and used in documentation or programs, as long as there is an accompanying
246 notice indicating the source.</i></p>
247</blockquote>
248<hr width="50%">
249<p align="center"><a href="http://www.unicode.org/unicode/copyright.html"><img
250src="http://www.unicode.org/img/hb_home.gif" border="0" alt="Home" width="40"
251height="49"><img src="http://www.unicode.org/img/hb_mid.gif" border="0"
252alt="Terms of Use" width="152" height="49"><img
253src="http://www.unicode.org/img/hb_mail.gif" border="0" alt="E-mail" width="46"
254height="49"></a>
255
256</body>
257
258</html>