Commit | Line | Data |
---|---|---|
505afebf JH |
1 | <html> |
2 | ||
3 | <head> | |
4 | <meta name="GENERATOR" content="Microsoft FrontPage 3.0"> | |
5 | <title>Unicode 3.0 NamesList File Structure</title> | |
6 | </head> | |
7 | ||
8 | <body> | |
9 | ||
10 | <h3>Unicode NamesList File Format</h3> | |
11 | ||
12 | <p>Last updated: 1999-07-06</p> | |
13 | ||
14 | <h3>1.0 Introduction</h3> | |
15 | ||
16 | <p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used | |
17 | to drive the layout of the character code charts in the Unicode Standard. The information | |
18 | in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files, | |
19 | together with additional annotations for many characters. This document describes the | |
20 | syntax rules for the file format, but also gives brief information on how each construct | |
21 | is rendered when laid out for the book. Some of the syntax elements were used in | |
22 | preparation of the drafts of the book and may not be present in the final, released form | |
23 | of the NamesList.txt file.</p> | |
24 | ||
25 | <p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred | |
26 | below as ISO-style). This necessitates the presence of some information in the name list | |
27 | file that is not needed (and in fact removed during parsing) for the Unicode book.</p> | |
28 | ||
29 | <p>With access to the layout program (unibook.exe) it is a simple matter of creating | |
30 | name lists for the purpose of formatting working drafts containing proposed characters.</p> | |
31 | ||
32 | <h3>1.1 NamesList File Overview</h3> | |
33 | ||
34 | <p>The *.lst files are plain text files which in their most simple form look like this</p> | |
35 | ||
36 | <p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br> | |
37 | ; this is a file comment (ignored)<br> | |
38 | 0020<tab>SPACE<br> | |
39 | 0021<tab>EXCLAMATION MARK<br> | |
40 | 0022<tab>QUOTATION MARK<br> | |
41 | . . . <br> | |
42 | 007F<tab>DELETE</p> | |
43 | ||
44 | <p>The semicolon (as first character), @ and <tab> characters are used by the file | |
45 | syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double | |
46 | @@ introduces a block header, with the title, and start and ending code of the block | |
47 | provided as shown.</p> | |
48 | ||
49 | <p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their | |
50 | constituent syntax elements are needed.</p> | |
51 | ||
52 | <p>The full syntax with all the options is provided in the following sections.</p> | |
53 | ||
54 | <h3>1.2 NamesList File Structure</h3> | |
55 | ||
56 | <p>This section gives defines the overall file structure</p> | |
57 | ||
58 | <pre><strong>NAMELIST: TITLE_PAGE* BLOCK* | |
59 | </strong> | |
60 | <strong>TITLE_PAGE: TITLE | |
61 | | TITLE_PAGE SUBTITLE | |
62 | | TITLE_PAGE SUBHEADER | |
63 | | TITLE_PAGE IGNORED_LINE | |
64 | | TITLE_PAGE EMPTY_LINE | |
65 | | TITLE_PAGE COMMENTLINE | |
66 | | TITLE_PAGE NOTICE | |
67 | | TITLE_PAGE PAGEBREAK | |
68 | </strong> | |
69 | <strong>BLOCK: BLOCKHEADER | |
70 | | BLOCK CHAR_ENTRY | |
71 | | BLOCK SUBHEADER | |
72 | | BLOCK NOTICE | |
73 | | BLOCK EMPTY_LINE | |
74 | | BLOCK IGNORED_LINE | |
75 | | BLOCK PAGEBREAK | |
76 | ||
77 | CHAR_ENTRY: NAME_LINE | RESERVED_LINE | |
78 | | CHAR_ENTRY ALIAS_LINE | |
79 | | CHAR_ENTRY COMMENT_LINE | |
80 | | CHAR_ENTRY CROSS_REF | |
81 | | CHAR_ENTRY DECOMPOSITION | |
82 | | CHAR_ENTRY COMPAT_MAPPING | |
83 | | CHAR_ENTRY IGNORED_LINE | |
84 | | CHAR_ENTRY EMPTY_LINE | |
85 | | CHAR_ENTRY NOTICE | |
86 | </strong></pre> | |
87 | ||
88 | <p>In other words:<br> | |
89 | <br> | |
90 | Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p> | |
91 | ||
92 | <p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, and IGNORED_LINE may | |
93 | occur before the first BLOCKHEADER.</p> | |
94 | ||
95 | <p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of | |
96 | the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE, | |
97 | CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p> | |
98 | ||
99 | <p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other | |
100 | place. </p> | |
101 | ||
102 | <p>Note: A NOTICE displays differently depending on whether it follows a header or title | |
103 | or is part of a CHAR_ENTRY.</p> | |
104 | ||
105 | <h3>1.3 NamesList File Elements</h3> | |
106 | ||
107 | <p>This section provides the details of the syntax for the individual elements.</p> | |
108 | ||
109 | <pre><small><strong>ELEMENT SYNTAX</strong> // How rendered</small></pre> | |
110 | ||
111 | <pre><small><strong>NAME_LINE: CHAR <tab> LINE | |
112 | </strong> // the CHAR and the corresponding image are echoed, | |
113 | // followed by the name as given in LINE | |
114 | ||
115 | <strong> CHAR TAB NAME COMMENT LF | |
116 | </strong> // Names may have a comment, which is stripped off | |
117 | // unless the file is parsed for an ISO style list | |
118 | ||
119 | <strong>RESERVED_LINE: CHAR TAB <reserved> | |
120 | </strong> // the CHAR is echoed followed by an icon for the | |
121 | // reserved character and a fixed string e.g. <reserved> | |
122 | ||
123 | <strong>COMMMENT_LINE: <tab> "*" SP EXPAND_LINE | |
124 | </strong> // * is replaced by BULLET, output line as comment | |
125 | <strong><tab> EXPAND_LINE</strong> | |
126 | // output line as comment | |
127 | ||
128 | <strong>ALIAS_LINE: <tab> "=" SP LINE | |
129 | </strong> // replace = by itself, output line as alias | |
130 | ||
131 | <strong>CROSS_REF: <tab> "X" SP EXPAND_LINE | |
132 | </strong> // X is replaced by a right arrow | |
133 | <strong> <tab> "X" SP "(" STRING SP "-" SP CHAR ")" | |
134 | </strong> // X is replaced by a right arrow | |
135 | // the "(", "-", ")" are removed, the | |
136 | // order of CHAR and STRING is reversed | |
137 | // i.e. both inputs result in the same output | |
138 | ||
139 | <strong>IGNORED_LINE: <tab> ";" EXPAND_LINE | |
140 | EMPTY_LINE: LF | |
141 | </strong> // empty lines and file comments are ignored | |
142 | ||
143 | <strong>DECOMPOSITION: <tab> ":" EXPAND_LINE | |
144 | </strong> // replace ':' by EQUIV, expand line into | |
145 | // decomposition | |
146 | ||
147 | <strong>COMPAT_MAPPING: <tab> "#" SP EXPAND_LINE | |
148 | </strong> // replace '#' by APPROX, output line as mapping | |
149 | ||
150 | <strong>NOTICE: "@+" <tab> LINE | |
151 | </strong> // skip '@+', output text as notice | |
152 | <strong> "@+" TAB * SP LINE | |
153 | </strong> // skip '@', output text as notice | |
154 | // "*" expands to a bullet character | |
155 | // Notices following a character code apply to the | |
156 | // character and are indented. Notices not following | |
157 | // a character code apply to the page/block/column | |
158 | // and are italicized, but not indented | |
159 | ||
160 | <strong>SUBTITLE: "@@@+" <tab> LINE | |
161 | </strong> // skip "@@@+", output text as subtitle | |
162 | ||
163 | <strong>SUBHEADER: "@" <tab> LINE | |
164 | </strong> // skip '@', output line as text as column header | |
165 | ||
166 | <strong>BLOCKHEADER: "@@" <tab> BLOCKSTART <tab> BLOCKNAME <tab> BLOCKEND | |
167 | </strong> // skip "@@", cause a page break and optional | |
168 | // blank page, then output one or more charts | |
169 | // followed by the list of character names. | |
170 | // use BLOCKSTART and BLOCKEND to define the | |
171 | // what characters belong to a block | |
172 | // use blockname in page and table headers | |
173 | <strong> "@@" <tab> BLOCKSTART <tab> BLOCKNAME COMMENT <tab> BLOCKEND | |
174 | </strong>// if a comment is present it replaces the blockname | |
175 | // when an ISO-style namelist is laid out | |
176 | ||
177 | <strong>BLOCKSTART: CHAR</strong> // first character position in block | |
178 | <strong>BLOCKEND: CHAR</strong> // last character position in block | |
179 | <strong>PAGE_BREAK: "@@"</strong> // insert a (column) break | |
180 | ||
181 | <strong>TITLE: "@@@" <tab> LINE</strong> | |
182 | // skip "@@@", output line as text | |
183 | // Title is used in page headers | |
184 | ||
185 | <strong>EXPAND_LINE: {CHAR | STRING}+ LF </strong> | |
186 | // all instances of CHAR *) are replaced by | |
187 | // CHAR NBSP x NBSP where x is the single Unicode | |
188 | // character corresponding to char | |
189 | // If character is combining, it is replaced with | |
190 | // CHAR NBSP <circ> x NBSP where <circ> is the | |
191 | // dotted circle</small> | |
192 | </pre> | |
193 | ||
194 | <h3><strong>1.4 NamesList File Primitives</strong></h3> | |
195 | ||
196 | <p>The following are the primitives and terminals for the NamesList syntax.</p> | |
197 | ||
198 | <pre><small><strong>LINE: STRING LF | |
199 | COMMENT: "(" NAME ")" | |
200 | "(" NAME ")" "*" | |
201 | </strong> | |
202 | <strong>NAME</strong>: <sequence of ASCII characters, except "(" or ")" > | |
203 | <strong>STRING</strong>: <sequence of Latin-1 characters> | |
204 | <strong>CHAR</strong>: <strong>X X X X</strong> | |
205 | <strong>| X X X X X X X X X</strong></small> | |
206 | <small><strong>X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F" | |
207 | <tab>:</strong> <sequence of one or more ASCII tab characters 0x09> | |
208 | <strong>SP</strong>: <ASCII 0x20> | |
209 | <strong>LF</strong>: <any sequence of ASCII 0x0A and 0x0D> | |
210 | </small></pre> | |
211 | ||
212 | <p><strong>Notes:</strong> | |
213 | ||
214 | <ul> | |
215 | <li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from | |
216 | being misinterpreted as ISO CHAR.</li> | |
217 | <li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as | |
218 | UTF-16LE.</li> | |
219 | <li>The final LF in the file must be present</li> | |
220 | <li>A CHAR inside ' or " is expanded, but only its glyph image is printed, the | |
221 | code value is not echoed</li> | |
222 | <li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules. | |
223 | Apostrophes are supported, but nested quotes are not.</li> | |
224 | </ul> | |
225 | </body> | |
226 | </html> |