Commit | Line | Data |
---|---|---|
06bfd75b JH |
1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> |
2 | <html> | |
3 | ||
4 | <head> | |
5 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | |
6 | <meta http-equiv="Content-Language" content="en-us"> | |
7 | <meta name="GENERATOR" content="Microsoft FrontPage 4.0"> | |
8 | <meta name="ProgId" content="FrontPage.Editor.Document"> | |
9 | <meta name="keywords" | |
10 | content="unicode, normalization, composition, decomposition"> | |
11 | <meta name="description" content="Describes PropList.html"> | |
12 | <title>UCD: Extended Character Properties</title> | |
13 | <link rel="stylesheet" type="text/css" href="http://www.unicode.org/unicode.css"> | |
14 | </head> | |
15 | ||
16 | <body bgcolor="#ffffff"> | |
17 | ||
18 | <table width="100%" cellpadding="0" cellspacing="0" border="0"> | |
19 | <tr> | |
20 | <td> | |
21 | <table width="100%" border="0" cellpadding="0" cellspacing="0"> | |
22 | <tr> | |
23 | <td class="icon"><a href="http://www.unicode.org"><img border="0" | |
24 | src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle" | |
25 | alt="[Unicode]" width="34" height="33"></a> <a | |
26 | class="bar" href="UnicodeCharacterDatabase.html">Unicode Character | |
27 | Database</a></td> | |
28 | </tr> | |
29 | </table> | |
30 | </td> | |
31 | </tr> | |
32 | <tr> | |
33 | <td class="gray"> </td> | |
34 | </tr> | |
35 | </table> | |
36 | <h1>Extended Character Properties</h1> | |
37 | <table height="87" cellspacing="2" cellpadding="0" width="100%" border="1"> | |
38 | <tbody> | |
39 | <tr> | |
40 | <td valign="top" width="144">Revision</td> | |
afc46004 | 41 | <td valign="top">3.1.1</td> |
06bfd75b JH |
42 | </tr> |
43 | <tr> | |
44 | <td valign="top" width="144">Authors</td> | |
45 | <td valign="top">Mark Davis</td> | |
46 | </tr> | |
47 | <tr> | |
48 | <td valign="top" width="144">Date</td> | |
afc46004 | 49 | <td valign="top">2001-07-12</td> |
06bfd75b JH |
50 | </tr> |
51 | <tr> | |
52 | <td valign="top" width="144">This Version</td> | |
53 | <td valign="top"><a | |
afc46004 | 54 | href="http://www.unicode.org/Public/3.1-Update1/PropList-3.1.1.html">http://www.unicode.org/Public/3.1-Update1/PropList-3.1.1.html</a></td> |
06bfd75b JH |
55 | </tr> |
56 | <tr> | |
57 | <td valign="top" width="144">Previous Version</td> | |
58 | <td valign="top">n/a</td> | |
59 | </tr> | |
60 | <tr> | |
61 | <td valign="top" width="144">Latest Version</td> | |
62 | <td valign="top"><a | |
63 | href="http://www.unicode.org/Public/UNIDATA/PropList.html">http://www.unicode.org/Public/UNIDATA/PropList.html</a></td> | |
64 | </tr> | |
65 | </tbody> | |
66 | </table> | |
67 | <h3><i><br> | |
68 | Summary</i></h3> | |
69 | <blockquote> | |
70 | <p><i>This document describes the format and content of the PropList.txt data | |
71 | file in the Unicode Character Database (UCD).</i></p> | |
72 | </blockquote> | |
73 | <h3><i>Status</i></h3> | |
74 | <blockquote> | |
75 | <p><i>The file and the files described herein are part of the Unicode | |
76 | Character Database and governed by the <a href="#UCD_Terms">UCD Terms of Use</a> | |
77 | given below.</i></p> | |
78 | <p><i>For general information on file formats and table formats, and the | |
79 | implications of normative vs informative properties, see | |
80 | UnicodeCharacterDatabase.html.</i></p> | |
81 | <p><i><b>Warning: </b>the information in this file does not completely | |
82 | describe the use and interpretation of Unicode character properties and | |
83 | behavior. It must be used in conjunction with the data in the other files in | |
84 | the UCD, and relies on the notation and definitions supplied in <a | |
85 | href="http://www.unicode.org/unicode/standard/versions/Unicode3.0.html">The | |
86 | Unicode Standard</a>. All chapter references are to Version 3.1.0 of the | |
87 | standard.</i></p> | |
88 | </blockquote> | |
89 | <hr width="50%"> | |
90 | <h2>Introduction</h2> | |
91 | <p align="left">PropList.txt contains extended properties that supplement the | |
92 | General Category property described in UnicodeData.html. Unlike the derived | |
93 | properties, the properties in PropList.txt cannot be derived directly from | |
94 | UnicodeData.txt or other data files of the UCD. These properties are listed in | |
95 | the following table.</p> | |
96 | <div align="center"> | |
97 | <center> | |
98 | <table border="1" cellspacing="0" cellpadding="3" class="smallText"> | |
99 | <tr> | |
100 | <th>Property Value</th> | |
101 | <th>N/I</th> | |
102 | <th>Definition and Usage</th> | |
103 | </tr> | |
104 | <tr> | |
105 | <th valign="top">White_space</th> | |
106 | <th valign="top">N</th> | |
107 | <td valign="top">Space characters and those format control characters | |
108 | (such as TAB, CR and LF) which should be treated by programming | |
109 | languages as "white space" for the purpose of parsing | |
110 | elements. | |
111 | <p><b>Note:</b> ZERO WIDTH SPACE and ZERO WIDTH NO-BREAK SPACE are not | |
112 | included, since their functions are restricted to line-break control. | |
113 | Their names are unfortunately misleading in this respect.</p> | |
114 | <p><b>Note: </b>There are other senses of "whitespace" that | |
115 | encompass a different set of characters.</p> | |
116 | </td> | |
117 | </tr> | |
118 | <tr> | |
119 | <th valign="top">Bidi_Control</th> | |
120 | <th valign="top">N</th> | |
121 | <td valign="top">Those format control characters which have specific | |
122 | functions in the Bidirectional Algorithm.</td> | |
123 | </tr> | |
124 | <tr> | |
125 | <th valign="top">Join_Control</th> | |
126 | <th valign="top">N</th> | |
127 | <td valign="top">Those format control characters which have specific | |
128 | functions for control of cursive joining and ligation.</td> | |
129 | </tr> | |
130 | <tr> | |
afc46004 JH |
131 | <th valign="top">ASCII_Hex_Digit</th> |
132 | <th valign="top">N</th> | |
133 | <td valign="top">ASCII characters commonly used for the representation of | |
134 | hexadecimal numbers.</td> | |
135 | </tr> | |
136 | <tr> | |
06bfd75b JH |
137 | <th valign="top">Dash</th> |
138 | <th valign="top">I</th> | |
139 | <td valign="top">Those punctuation characters explicitly called out as | |
140 | dashes in the Unicode Standard, plus compatibility equivalents to those. | |
141 | Most of these have the Pd General Category, but some have the Sm General | |
142 | Category because of their use in mathematics.</td> | |
143 | </tr> | |
144 | <tr> | |
145 | <th valign="top">Hyphen</th> | |
146 | <th valign="top">I</th> | |
147 | <td valign="top">Those dashes used to mark connections between pieces of | |
148 | words, plus the Katakana middle dot. The Katakana middle dot functions | |
149 | like a hyphen, but is shaped like a dot rather than a dash.</td> | |
150 | </tr> | |
151 | <tr> | |
152 | <th valign="top">Quotation_Mark</th> | |
153 | <th valign="top">I</th> | |
154 | <td valign="top">Those punctuation characters that function as quotation | |
155 | marks.</td> | |
156 | </tr> | |
157 | <tr> | |
158 | <th valign="top">Terminal_Punctuation</th> | |
159 | <th valign="top">I</th> | |
160 | <td valign="top">Those punctuation characters that generally mark the end | |
161 | of textual units.</td> | |
162 | </tr> | |
163 | <tr> | |
164 | <th valign="top">Other_Math</th> | |
165 | <th valign="top">I</th> | |
166 | <td valign="top">Math characters that do not have the Sm General Category.</td> | |
167 | </tr> | |
168 | <tr> | |
169 | <th valign="top">Hex_Digit</th> | |
170 | <th valign="top">I</th> | |
171 | <td valign="top">Characters commonly used for the representation of | |
172 | hexadecimal numbers, plus their compatibility equivalents.</td> | |
173 | </tr> | |
174 | <tr> | |
175 | <th valign="top">Other_Alphabetic</th> | |
176 | <th valign="top">I</th> | |
177 | <td valign="top">Alphabetic characters that do not have L as their major | |
178 | class for the General Category (Lu, Ll, Lt, Lm, Lo).</td> | |
179 | </tr> | |
180 | <tr> | |
181 | <th valign="top">Ideographic</th> | |
182 | <th valign="top">I</th> | |
183 | <td valign="top">Characters considered to be CJKV (Chinese, Japanese, | |
184 | Korean, and Vietnamese) ideographs.</td> | |
185 | </tr> | |
186 | <tr> | |
187 | <th valign="top">Diacritic</th> | |
188 | <th valign="top">I</th> | |
189 | <td valign="top">Characters that linguistically modify the meaning of | |
190 | another character to which they apply. Some diacritics are not combining | |
191 | characters, and some combining characters are not diacritics.</td> | |
192 | </tr> | |
193 | <tr> | |
194 | <th valign="top">Extender</th> | |
195 | <th valign="top">I</th> | |
196 | <td valign="top">Characters whose principal function is to extend the | |
197 | value or shape of a preceding alphabetic character. Typical of these are | |
198 | length and iteration marks.</td> | |
199 | </tr> | |
200 | <tr> | |
201 | <th valign="top">Other_Lowercase</th> | |
202 | <th valign="top">I</th> | |
203 | <td valign="top">Lowercase characters that do not have the Ll General | |
204 | Category.</td> | |
205 | </tr> | |
206 | <tr> | |
207 | <th valign="top">Other_Uppercase</th> | |
208 | <th valign="top">I</th> | |
209 | <td valign="top">Uppercase characters that do not have the Lu General | |
210 | Category.</td> | |
211 | </tr> | |
212 | <tr> | |
213 | <th valign="top">Noncharacter_Code_Point</th> | |
214 | <th valign="top">N</th> | |
215 | <td valign="top">Code points that are explicitly defined as illegal for | |
216 | the encoding of characters. See <a | |
217 | href="http://www.unicode.org/unicode/reports/tr27/">Unicode 3.1</a> for | |
218 | more information.</td> | |
219 | </tr> | |
220 | </table> | |
221 | </center> | |
222 | </div> | |
223 | <h2><i><a name="UCD_Terms"><br> | |
224 | UCD Terms of Use</a></i></h2> | |
225 | <h3><i>Disclaimer</i></h3> | |
226 | <blockquote> | |
227 | <p><i>The Unicode Character Database is provided as is by Unicode, Inc. No | |
228 | claims are made as to fitness for any particular purpose. No warranties of any | |
229 | kind are expressed or implied. The recipient agrees to determine applicability | |
230 | of information provided. If this file has been purchased on magnetic or | |
231 | optical media from Unicode, Inc., the sole remedy for any claim will be | |
232 | exchange of defective media within 90 days of receipt.</i></p> | |
233 | <p><i>This disclaimer is applicable for all other data files accompanying the | |
234 | Unicode Character Database, some of which have been compiled by the Unicode | |
235 | Consortium, and some of which have been supplied by other sources.</i></p> | |
236 | </blockquote> | |
237 | <h3><i>Limitations on Rights to Redistribute This Data</i></h3> | |
238 | <blockquote> | |
239 | <p><i>Recipient is granted the right to make copies in any form for internal | |
240 | distribution and to freely use the information supplied in the creation of | |
241 | products supporting the Unicode<sup>TM</sup> Standard. The files in the | |
242 | Unicode Character Database can be redistributed to third parties or other | |
243 | organizations (whether for profit or not) as long as this notice and the | |
244 | disclaimer notice are retained. Information can be extracted from these files | |
245 | and used in documentation or programs, as long as there is an accompanying | |
246 | notice indicating the source.</i></p> | |
247 | </blockquote> | |
248 | <hr width="50%"> | |
249 | <p align="center"><a href="http://www.unicode.org/unicode/copyright.html"><img | |
250 | src="http://www.unicode.org/img/hb_home.gif" border="0" alt="Home" width="40" | |
251 | height="49"><img src="http://www.unicode.org/img/hb_mid.gif" border="0" | |
252 | alt="Terms of Use" width="152" height="49"><img | |
253 | src="http://www.unicode.org/img/hb_mail.gif" border="0" alt="E-mail" width="46" | |
254 | height="49"></a> | |
255 | ||
256 | </body> | |
257 | ||
258 | </html> |