Commit | Line | Data |
---|---|---|
9378c581 | 1 | |
f918d677 | 2 | # Time-stamp: "2003-04-02 11:10:32 AHST" |
9378c581 JH |
3 | |
4 | =head1 NAME | |
5 | ||
f918d677 | 6 | Locale::Maketext - framework for localization |
9378c581 JH |
7 | |
8 | =head1 SYNOPSIS | |
9 | ||
10 | package MyProgram; | |
11 | use strict; | |
12 | use MyProgram::L10N; | |
13 | # ...which inherits from Locale::Maketext | |
14 | my $lh = MyProgram::L10N->get_handle() || die "What language?"; | |
15 | ... | |
16 | # And then any messages your program emits, like: | |
17 | warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! ); | |
18 | ... | |
19 | ||
20 | =head1 DESCRIPTION | |
21 | ||
22 | It is a common feature of applications (whether run directly, | |
23 | or via the Web) for them to be "localized" -- i.e., for them | |
24 | to a present an English interface to an English-speaker, a German | |
25 | interface to a German-speaker, and so on for all languages it's | |
26 | programmed with. Locale::Maketext | |
27 | is a framework for software localization; it provides you with the | |
28 | tools for organizing and accessing the bits of text and text-processing | |
29 | code that you need for producing localized applications. | |
30 | ||
31 | In order to make sense of Maketext and how all its | |
32 | components fit together, you should probably | |
33 | go read L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>, and | |
34 | I<then> read the following documentation. | |
35 | ||
36 | You may also want to read over the source for C<File::Findgrep> | |
37 | and its constituent modules -- they are a complete (if small) | |
38 | example application that uses Maketext. | |
39 | ||
40 | =head1 QUICK OVERVIEW | |
41 | ||
42 | The basic design of Locale::Maketext is object-oriented, and | |
43 | Locale::Maketext is an abstract base class, from which you | |
44 | derive a "project class". | |
45 | The project class (with a name like "TkBocciBall::Localize", | |
46 | which you then use in your module) is in turn the base class | |
47 | for all the "language classes" for your project | |
48 | (with names "TkBocciBall::Localize::it", | |
49 | "TkBocciBall::Localize::en", | |
50 | "TkBocciBall::Localize::fr", etc.). | |
51 | ||
52 | A language class is | |
53 | a class containing a lexicon of phrases as class data, | |
54 | and possibly also some methods that are of use in interpreting | |
55 | phrases in the lexicon, or otherwise dealing with text in that | |
56 | language. | |
57 | ||
58 | An object belonging to a language class is called a "language | |
59 | handle"; it's typically a flyweight object. | |
60 | ||
61 | The normal course of action is to call: | |
62 | ||
63 | use TkBocciBall::Localize; # the localization project class | |
64 | $lh = TkBocciBall::Localize->get_handle(); | |
65 | # Depending on the user's locale, etc., this will | |
66 | # make a language handle from among the classes available, | |
67 | # and any defaults that you declare. | |
68 | die "Couldn't make a language handle??" unless $lh; | |
69 | ||
70 | From then on, you use the C<maketext> function to access | |
71 | entries in whatever lexicon(s) belong to the language handle | |
72 | you got. So, this: | |
73 | ||
74 | print $lh->maketext("You won!"), "\n"; | |
75 | ||
76 | ...emits the right text for this language. If the object | |
77 | in C<$lh> belongs to class "TkBocciBall::Localize::fr" and | |
78 | %TkBocciBall::Localize::fr::Lexicon contains C<("You won!" | |
79 | =E<gt> "Tu as gagnE<eacute>!")>, then the above | |
80 | code happily tells the user "Tu as gagnE<eacute>!". | |
81 | ||
82 | =head1 METHODS | |
83 | ||
84 | Locale::Maketext offers a variety of methods, which fall | |
85 | into three categories: | |
86 | ||
87 | =over | |
88 | ||
89 | =item * | |
90 | ||
91 | Methods to do with constructing language handles. | |
92 | ||
93 | =item * | |
94 | ||
95 | C<maketext> and other methods to do with accessing %Lexicon data | |
96 | for a given language handle. | |
97 | ||
98 | =item * | |
99 | ||
100 | Methods that you may find it handy to use, from routines of | |
101 | yours that you put in %Lexicon entries. | |
102 | ||
103 | =back | |
104 | ||
105 | These are covered in the following section. | |
106 | ||
107 | =head2 Construction Methods | |
108 | ||
109 | These are to do with constructing a language handle: | |
110 | ||
111 | =over | |
112 | ||
f918d677 | 113 | =item * |
5dc6f178 JH |
114 | |
115 | $lh = YourProjClass->get_handle( ...langtags... ) || die "lg-handle?"; | |
9378c581 JH |
116 | |
117 | This tries loading classes based on the language-tags you give (like | |
118 | C<("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")>, and for the first class | |
119 | that succeeds, returns YourProjClass::I<language>->new(). | |
120 | ||
121 | It runs thru the entire given list of language-tags, and finds no classes | |
122 | for those exact terms, it then tries "superordinate" language classes. | |
123 | So if no "en-US" class (i.e., YourProjClass::en_us) | |
124 | was found, nor classes for anything else in that list, we then try | |
125 | its superordinate, "en" (i.e., YourProjClass::en), and so on thru | |
126 | the other language-tags in the given list: "es". | |
127 | (The other language-tags in our example list: | |
128 | happen to have no superordinates.) | |
129 | ||
130 | If none of those language-tags leads to loadable classes, we then | |
131 | try classes derived from YourProjClass->fallback_languages() and | |
132 | then if nothing comes of that, we use classes named by | |
133 | YourProjClass->fallback_language_classes(). Then in the (probably | |
134 | quite unlikely) event that that fails, we just return undef. | |
135 | ||
5dc6f178 JH |
136 | =item * |
137 | ||
138 | $lh = YourProjClass->get_handleB<()> || die "lg-handle?"; | |
9378c581 JH |
139 | |
140 | When C<get_handle> is called with an empty parameter list, magic happens: | |
141 | ||
142 | If C<get_handle> senses that it's running in program that was | |
143 | invoked as a CGI, then it tries to get language-tags out of the | |
144 | environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that | |
145 | those were the languages passed as parameters to C<get_handle>. | |
146 | ||
147 | Otherwise (i.e., if not a CGI), this tries various OS-specific ways | |
148 | to get the language-tags for the current locale/language, and then | |
f918d677 | 149 | pretends that those were the value(s) passed to C<get_handle>. |
9378c581 JH |
150 | |
151 | Currently this OS-specific stuff consists of looking in the environment | |
152 | variables "LANG" and "LANGUAGE"; and on MSWin machines (where those | |
153 | variables are typically unused), this also tries using | |
154 | the module Win32::Locale to get a language-tag for whatever language/locale | |
155 | is currently selected in the "Regional Settings" (or "International"?) | |
156 | Control Panel. I welcome further | |
157 | suggestions for making this do the Right Thing under other operating | |
158 | systems that support localization. | |
159 | ||
160 | If you're using localization in an application that keeps a configuration | |
161 | file, you might consider something like this in your project class: | |
162 | ||
163 | sub get_handle_via_config { | |
164 | my $class = $_[0]; | |
165 | my $preferred_language = $Config_settings{'language'}; | |
166 | my $lh; | |
167 | if($preferred_language) { | |
168 | $lh = $class->get_handle($chosen_language) | |
169 | || die "No language handle for \"$chosen_language\" or the like"; | |
170 | } else { | |
171 | # Config file missing, maybe? | |
172 | $lh = $class->get_handle() | |
173 | || die "Can't get a language handle"; | |
174 | } | |
175 | return $lh; | |
176 | } | |
177 | ||
5dc6f178 JH |
178 | =item * |
179 | ||
180 | $lh = YourProjClass::langname->new(); | |
9378c581 JH |
181 | |
182 | This constructs a language handle. You usually B<don't> call this | |
183 | directly, but instead let C<get_handle> find a language class to C<use> | |
184 | and to then call ->new on. | |
185 | ||
5dc6f178 JH |
186 | =item * |
187 | ||
188 | $lh->init(); | |
9378c581 JH |
189 | |
190 | This is called by ->new to initialize newly-constructed language handles. | |
191 | If you define an init method in your class, remember that it's usually | |
192 | considered a good idea to call $lh->SUPER::init in it (presumably at the | |
193 | beginning), so that all classes get a chance to initialize a new object | |
194 | however they see fit. | |
195 | ||
5dc6f178 JH |
196 | =item * |
197 | ||
198 | YourProjClass->fallback_languages() | |
9378c581 JH |
199 | |
200 | C<get_handle> appends the return value of this to the end of | |
201 | whatever list of languages you pass C<get_handle>. Unless | |
202 | you override this method, your project class | |
203 | will inherit Locale::Maketext's C<fallback_languages>, which | |
204 | currently returns C<('i-default', 'en', 'en-US')>. | |
205 | ("i-default" is defined in RFC 2277). | |
206 | ||
207 | This method (by having it return the name | |
208 | of a language-tag that has an existing language class) | |
209 | can be used for making sure that | |
210 | C<get_handle> will always manage to construct a language | |
211 | handle (assuming your language classes are in an appropriate | |
212 | @INC directory). Or you can use the next method: | |
213 | ||
5dc6f178 JH |
214 | =item * |
215 | ||
216 | YourProjClass->fallback_language_classes() | |
9378c581 JH |
217 | |
218 | C<get_handle> appends the return value of this to the end | |
219 | of the list of classes it will try using. Unless | |
220 | you override this method, your project class | |
221 | will inherit Locale::Maketext's C<fallback_language_classes>, | |
222 | which currently returns an empty list, C<()>. | |
223 | By setting this to some value (namely, the name of a loadable | |
224 | language class), you can be sure that | |
225 | C<get_handle> will always manage to construct a language | |
226 | handle. | |
227 | ||
228 | =back | |
229 | ||
230 | =head2 The "maketext" Method | |
231 | ||
232 | This is the most important method in Locale::Maketext: | |
233 | ||
234 | $text = $lh->maketext(I<key>, ...parameters for this phrase...); | |
235 | ||
236 | This looks in the %Lexicon of the language handle | |
237 | $lh and all its superclasses, looking | |
238 | for an entry whose key is the string I<key>. Assuming such | |
239 | an entry is found, various things then happen, depending on the | |
240 | value found: | |
241 | ||
242 | If the value is a scalarref, the scalar is dereferenced and returned | |
243 | (and any parameters are ignored). | |
244 | If the value is a coderef, we return &$value($lh, ...parameters...). | |
245 | If the value is a string that I<doesn't> look like it's in Bracket Notation, | |
246 | we return it (after replacing it with a scalarref, in its %Lexicon). | |
247 | If the value I<does> look like it's in Bracket Notation, then we compile | |
248 | it into a sub, replace the string in the %Lexicon with the new coderef, | |
249 | and then we return &$new_sub($lh, ...parameters...). | |
250 | ||
251 | Bracket Notation is discussed in a later section. Note | |
252 | that trying to compile a string into Bracket Notation can throw | |
253 | an exception if the string is not syntactically valid (say, by not | |
254 | balancing brackets right.) | |
255 | ||
256 | Also, calling &$coderef($lh, ...parameters...) can throw any sort of | |
257 | exception (if, say, code in that sub tries to divide by zero). But | |
258 | a very common exception occurs when you have Bracket | |
259 | Notation text that says to call a method "foo", but there is no such | |
260 | method. (E.g., "You have [quaB<tn>,_1,ball]." will throw an exception | |
261 | on trying to call $lh->quaB<tn>($_[1],'ball') -- you presumably meant | |
262 | "quant".) C<maketext> catches these exceptions, but only to make the | |
263 | error message more readable, at which point it rethrows the exception. | |
264 | ||
265 | An exception I<may> be thrown if I<key> is not found in any | |
266 | of $lh's %Lexicon hashes. What happens if a key is not found, | |
267 | is discussed in a later section, "Controlling Lookup Failure". | |
268 | ||
269 | Note that you might find it useful in some cases to override | |
270 | the C<maketext> method with an "after method", if you want to | |
271 | translate encodings, or even scripts: | |
272 | ||
273 | package YrProj::zh_cn; # Chinese with PRC-style glyphs | |
274 | use base ('YrProj::zh_tw'); # Taiwan-style | |
275 | sub maketext { | |
276 | my $self = shift(@_); | |
277 | my $value = $self->maketext(@_); | |
278 | return Chineeze::taiwan2mainland($value); | |
279 | } | |
280 | ||
281 | Or you may want to override it with something that traps | |
282 | any exceptions, if that's critical to your program: | |
283 | ||
284 | sub maketext { | |
285 | my($lh, @stuff) = @_; | |
286 | my $out; | |
287 | eval { $out = $lh->SUPER::maketext(@stuff) }; | |
288 | return $out unless $@; | |
289 | ...otherwise deal with the exception... | |
290 | } | |
291 | ||
292 | Other than those two situations, I don't imagine that | |
293 | it's useful to override the C<maketext> method. (If | |
294 | you run into a situation where it is useful, I'd be | |
295 | interested in hearing about it.) | |
296 | ||
297 | =over | |
298 | ||
299 | =item $lh->fail_with I<or> $lh->fail_with(I<PARAM>) | |
300 | ||
301 | =item $lh->failure_handler_auto | |
302 | ||
303 | These two methods are discussed in the section "Controlling | |
304 | Lookup Failure". | |
305 | ||
306 | =back | |
307 | ||
308 | =head2 Utility Methods | |
309 | ||
310 | These are methods that you may find it handy to use, generally | |
311 | from %Lexicon routines of yours (whether expressed as | |
312 | Bracket Notation or not). | |
313 | ||
314 | =over | |
315 | ||
316 | =item $language->quant($number, $singular) | |
317 | ||
318 | =item $language->quant($number, $singular, $plural) | |
319 | ||
320 | =item $language->quant($number, $singular, $plural, $negative) | |
321 | ||
322 | This is generally meant to be called from inside Bracket Notation | |
323 | (which is discussed later), as in | |
324 | ||
325 | "Your search matched [quant,_1,document]!" | |
326 | ||
327 | It's for I<quantifying> a noun (i.e., saying how much of it there is, | |
f918d677 | 328 | while giving the correct form of it). The behavior of this method is |
9378c581 JH |
329 | handy for English and a few other Western European languages, and you |
330 | should override it for languages where it's not suitable. You can feel | |
331 | free to read the source, but the current implementation is basically | |
332 | as this pseudocode describes: | |
333 | ||
334 | if $number is 0 and there's a $negative, | |
335 | return $negative; | |
336 | elsif $number is 1, | |
337 | return "1 $singular"; | |
338 | elsif there's a $plural, | |
339 | return "$number $plural"; | |
340 | else | |
341 | return "$number " . $singular . "s"; | |
342 | # | |
343 | # ...except that we actually call numf to | |
344 | # stringify $number before returning it. | |
345 | ||
346 | So for English (with Bracket Notation) | |
347 | C<"...[quant,_1,file]..."> is fine (for 0 it returns "0 files", | |
348 | for 1 it returns "1 file", and for more it returns "2 files", etc.) | |
349 | ||
f918d677 | 350 | But for "directory", you'd want C<"[quant,_1,directory,directories]"> |
9378c581 JH |
351 | so that our elementary C<quant> method doesn't think that the |
352 | plural of "directory" is "directorys". And you might find that the | |
353 | output may sound better if you specify a negative form, as in: | |
354 | ||
355 | "[quant,_1,file,files,No files] matched your query.\n" | |
356 | ||
357 | Remember to keep in mind verb agreement (or adjectives too, in | |
358 | other languages), as in: | |
359 | ||
360 | "[quant,_1,document] were matched.\n" | |
361 | ||
362 | Because if _1 is one, you get "1 document B<were> matched". | |
363 | An acceptable hack here is to do something like this: | |
364 | ||
365 | "[quant,_1,document was, documents were] matched.\n" | |
366 | ||
367 | =item $language->numf($number) | |
368 | ||
369 | This returns the given number formatted nicely according to | |
370 | this language's conventions. Maketext's default method is | |
371 | mostly to just take the normal string form of the number | |
372 | (applying sprintf "%G" for only very large numbers), and then | |
373 | to add commas as necessary. (Except that | |
374 | we apply C<tr/,./.,/> if $language->{'numf_comma'} is true; | |
375 | that's a bit of a hack that's useful for languages that express | |
376 | two million as "2.000.000" and not as "2,000,000"). | |
377 | ||
378 | If you want anything fancier, consider overriding this with something | |
379 | that uses L<Number::Format|Number::Format>, or does something else | |
380 | entirely. | |
381 | ||
382 | Note that numf is called by quant for stringifying all quantifying | |
383 | numbers. | |
384 | ||
385 | =item $language->sprintf($format, @items) | |
386 | ||
387 | This is just a wrapper around Perl's normal C<sprintf> function. | |
388 | It's provided so that you can use "sprintf" in Bracket Notation: | |
389 | ||
390 | "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n" | |
391 | ||
392 | returning... | |
393 | ||
394 | Couldn't access datanode Stuff=[thangamabob]! | |
395 | ||
396 | =item $language->language_tag() | |
397 | ||
398 | Currently this just takes the last bit of C<ref($language)>, turns | |
399 | underscores to dashes, and returns it. So if $language is | |
400 | an object of class Hee::HOO::Haw::en_us, $language->language_tag() | |
401 | returns "en-us". (Yes, the usual representation for that language | |
402 | tag is "en-US", but case is I<never> considered meaningful in | |
403 | language-tag comparison.) | |
404 | ||
405 | You may override this as you like; Maketext doesn't use it for | |
406 | anything. | |
407 | ||
408 | =item $language->encoding() | |
409 | ||
410 | Currently this isn't used for anything, but it's provided | |
411 | (with default value of | |
412 | C<(ref($language) && $language-E<gt>{'encoding'})) or "iso-8859-1"> | |
413 | ) as a sort of suggestion that it may be useful/necessary to | |
414 | associate encodings with your language handles (whether on a | |
415 | per-class or even per-handle basis.) | |
416 | ||
417 | =back | |
418 | ||
419 | =head2 Language Handle Attributes and Internals | |
420 | ||
421 | A language handle is a flyweight object -- i.e., it doesn't (necessarily) | |
422 | carry any data of interest, other than just being a member of | |
423 | whatever class it belongs to. | |
424 | ||
425 | A language handle is implemented as a blessed hash. Subclasses of yours | |
426 | can store whatever data you want in the hash. Currently the only hash | |
427 | entry used by any crucial Maketext method is "fail", so feel free to | |
428 | use anything else as you like. | |
429 | ||
430 | B<Remember: Don't be afraid to read the Maketext source if there's | |
431 | any point on which this documentation is unclear.> This documentation | |
432 | is vastly longer than the module source itself. | |
433 | ||
434 | =over | |
435 | ||
436 | =back | |
437 | ||
438 | =head1 LANGUAGE CLASS HIERARCHIES | |
439 | ||
440 | These are Locale::Maketext's assumptions about the class | |
441 | hierarchy formed by all your language classes: | |
442 | ||
443 | =over | |
444 | ||
445 | =item * | |
446 | ||
447 | You must have a project base class, which you load, and | |
448 | which you then use as the first argument in | |
449 | the call to YourProjClass->get_handle(...). It should derive | |
450 | (whether directly or indirectly) from Locale::Maketext. | |
451 | It B<doesn't matter> how you name this class, altho assuming this | |
452 | is the localization component of your Super Mega Program, | |
453 | good names for your project class might be | |
454 | SuperMegaProgram::Localization, SuperMegaProgram::L10N, | |
455 | SuperMegaProgram::I18N, SuperMegaProgram::International, | |
456 | or even SuperMegaProgram::Languages or SuperMegaProgram::Messages. | |
457 | ||
458 | =item * | |
459 | ||
460 | Language classes are what YourProjClass->get_handle will try to load. | |
461 | It will look for them by taking each language-tag (B<skipping> it | |
462 | if it doesn't look like a language-tag or locale-tag!), turning it to | |
463 | all lowercase, turning and dashes to underscores, and appending it | |
464 | to YourProjClass . "::". So this: | |
465 | ||
466 | $lh = YourProjClass->get_handle( | |
467 | 'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized' | |
468 | ); | |
469 | ||
470 | will try loading the classes | |
471 | YourProjClass::en_us (note lowercase!), YourProjClass::fr, | |
472 | YourProjClass::kon, | |
473 | YourProjClass::i_klingon | |
474 | and YourProjClass::i_klingon_romanized. (And it'll stop at the | |
475 | first one that actually loads.) | |
476 | ||
477 | =item * | |
478 | ||
479 | I assume that each language class derives (directly or indirectly) | |
480 | from your project class, and also defines its @ISA, its %Lexicon, | |
481 | or both. But I anticipate no dire consequences if these assumptions | |
482 | do not hold. | |
483 | ||
484 | =item * | |
485 | ||
486 | Language classes may derive from other language classes (altho they | |
487 | should have "use I<Thatclassname>" or "use base qw(I<...classes...>)"). | |
488 | They may derive from the project | |
489 | class. They may derive from some other class altogether. Or via | |
490 | multiple inheritance, it may derive from any mixture of these. | |
491 | ||
492 | =item * | |
493 | ||
494 | I foresee no problems with having multiple inheritance in | |
495 | your hierarchy of language classes. (As usual, however, Perl will | |
496 | complain bitterly if you have a cycle in the hierarchy: i.e., if | |
497 | any class is its own ancestor.) | |
498 | ||
499 | =back | |
500 | ||
501 | =head1 ENTRIES IN EACH LEXICON | |
502 | ||
503 | A typical %Lexicon entry is meant to signify a phrase, | |
504 | taking some number (0 or more) of parameters. An entry | |
505 | is meant to be accessed by via | |
506 | a string I<key> in $lh->maketext(I<key>, ...parameters...), | |
507 | which should return a string that is generally meant for | |
508 | be used for "output" to the user -- regardless of whether | |
509 | this actually means printing to STDOUT, writing to a file, | |
510 | or putting into a GUI widget. | |
511 | ||
512 | While the key must be a string value (since that's a basic | |
513 | restriction that Perl places on hash keys), the value in | |
f918d677 | 514 | the lexicon can currently be of several types: |
9378c581 JH |
515 | a defined scalar, scalarref, or coderef. The use of these is |
516 | explained above, in the section 'The "maketext" Method', and | |
517 | Bracket Notation for strings is discussed in the next section. | |
518 | ||
519 | While you can use arbitrary unique IDs for lexicon keys | |
520 | (like "_min_larger_max_error"), it is often | |
521 | useful for if an entry's key is itself a valid value, like | |
522 | this example error message: | |
523 | ||
524 | "Minimum ([_1]) is larger than maximum ([_2])!\n", | |
525 | ||
526 | Compare this code that uses an arbitrary ID... | |
527 | ||
528 | die $lh->maketext( "_min_larger_max_error", $min, $max ) | |
529 | if $min > $max; | |
530 | ||
531 | ...to this code that uses a key-as-value: | |
532 | ||
533 | die $lh->maketext( | |
534 | "Minimum ([_1]) is larger than maximum ([_2])!\n", | |
535 | $min, $max | |
536 | ) if $min > $max; | |
537 | ||
538 | The second is, in short, more readable. In particular, it's obvious | |
539 | that the number of parameters you're feeding to that phrase (two) is | |
540 | the number of parameters that it I<wants> to be fed. (Since you see | |
541 | _1 and a _2 being used in the key there.) | |
542 | ||
543 | Also, once a project is otherwise | |
544 | complete and you start to localize it, you can scrape together | |
545 | all the various keys you use, and pass it to a translator; and then | |
546 | the translator's work will go faster if what he's presented is this: | |
547 | ||
548 | "Minimum ([_1]) is larger than maximum ([_2])!\n", | |
549 | => "", # fill in something here, Jacques! | |
550 | ||
551 | rather than this more cryptic mess: | |
552 | ||
553 | "_min_larger_max_error" | |
554 | => "", # fill in something here, Jacques | |
555 | ||
556 | I think that keys as lexicon values makes the completed lexicon | |
557 | entries more readable: | |
558 | ||
559 | "Minimum ([_1]) is larger than maximum ([_2])!\n", | |
560 | => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n", | |
561 | ||
562 | Also, having valid values as keys becomes very useful if you set | |
563 | up an _AUTO lexicon. _AUTO lexicons are discussed in a later | |
564 | section. | |
565 | ||
566 | I almost always use keys that are themselves | |
567 | valid lexicon values. One notable exception is when the value is | |
568 | quite long. For example, to get the screenful of data that | |
569 | a command-line program might returns when given an unknown switch, | |
570 | I often just use a key "_USAGE_MESSAGE". At that point I then go | |
571 | and immediately to define that lexicon entry in the | |
572 | ProjectClass::L10N::en lexicon (since English is always my "project | |
f918d677 | 573 | language"): |
9378c581 JH |
574 | |
575 | '_USAGE_MESSAGE' => <<'EOSTUFF', | |
576 | ...long long message... | |
577 | EOSTUFF | |
578 | ||
579 | and then I can use it as: | |
580 | ||
581 | getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE'); | |
582 | ||
583 | Incidentally, | |
584 | note that each class's C<%Lexicon> inherits-and-extends | |
585 | the lexicons in its superclasses. This is not because these are | |
586 | special hashes I<per se>, but because you access them via the | |
587 | C<maketext> method, which looks for entries across all the | |
588 | C<%Lexicon>'s in a language class I<and> all its ancestor classes. | |
589 | (This is because the idea of "class data" isn't directly implemented | |
590 | in Perl, but is instead left to individual class-systems to implement | |
591 | as they see fit..) | |
592 | ||
593 | Note that you may have things stored in a lexicon | |
594 | besides just phrases for output: for example, if your program | |
595 | takes input from the keyboard, asking a "(Y/N)" question, | |
596 | you probably need to know what equivalent of "Y[es]/N[o]" is | |
597 | in whatever language. You probably also need to know what | |
598 | the equivalents of the answers "y" and "n" are. You can | |
599 | store that information in the lexicon (say, under the keys | |
600 | "~answer_y" and "~answer_n", and the long forms as | |
601 | "~answer_yes" and "~answer_no", where "~" is just an ad-hoc | |
602 | character meant to indicate to programmers/translators that | |
603 | these are not phrases for output). | |
604 | ||
605 | Or instead of storing this in the language class's lexicon, | |
606 | you can (and, in some cases, really should) represent the same bit | |
607 | of knowledge as code is a method in the language class. (That | |
608 | leaves a tidy distinction between the lexicon as the things we | |
609 | know how to I<say>, and the rest of the things in the lexicon class | |
610 | as things that we know how to I<do>.) Consider | |
611 | this example of a processor for responses to French "oui/non" | |
612 | questions: | |
613 | ||
614 | sub y_or_n { | |
615 | return undef unless defined $_[1] and length $_[1]; | |
616 | my $answer = lc $_[1]; # smash case | |
617 | return 1 if $answer eq 'o' or $answer eq 'oui'; | |
618 | return 0 if $answer eq 'n' or $answer eq 'non'; | |
619 | return undef; | |
620 | } | |
621 | ||
622 | ...which you'd then call in a construct like this: | |
623 | ||
624 | my $response; | |
625 | until(defined $response) { | |
626 | print $lh->maketext("Open the pod bay door (y/n)? "); | |
627 | $response = $lh->y_or_n( get_input_from_keyboard_somehow() ); | |
628 | } | |
629 | if($response) { $pod_bay_door->open() } | |
630 | else { $pod_bay_door->leave_closed() } | |
631 | ||
632 | Other data worth storing in a lexicon might be things like | |
633 | filenames for language-targetted resources: | |
634 | ||
635 | ... | |
636 | "_main_splash_png" | |
637 | => "/styles/en_us/main_splash.png", | |
638 | "_main_splash_imagemap" | |
639 | => "/styles/en_us/main_splash.incl", | |
640 | "_general_graphics_path" | |
641 | => "/styles/en_us/", | |
642 | "_alert_sound" | |
643 | => "/styles/en_us/hey_there.wav", | |
644 | "_forward_icon" | |
645 | => "left_arrow.png", | |
646 | "_backward_icon" | |
647 | => "right_arrow.png", | |
648 | # In some other languages, left equals | |
649 | # BACKwards, and right is FOREwards. | |
650 | ... | |
651 | ||
652 | You might want to do the same thing for expressing key bindings | |
653 | or the like (since hardwiring "q" as the binding for the function | |
654 | that quits a screen/menu/program is useful only if your language | |
655 | happens to associate "q" with "quit"!) | |
656 | ||
657 | =head1 BRACKET NOTATION | |
658 | ||
659 | Bracket Notation is a crucial feature of Locale::Maketext. I mean | |
660 | Bracket Notation to provide a replacement for sprintf formatting. | |
661 | Everything you do with Bracket Notation could be done with a sub block, | |
662 | but bracket notation is meant to be much more concise. | |
663 | ||
664 | Bracket Notation is a like a miniature "template" system (in the sense | |
665 | of L<Text::Template|Text::Template>, not in the sense of C++ templates), | |
666 | where normal text is passed thru basically as is, but text is special | |
667 | regions is specially interpreted. In Bracket Notation, you use brackets | |
668 | ("[...]" -- not "{...}"!) to note sections that are specially interpreted. | |
669 | ||
670 | For example, here all the areas that are taken literally are underlined with | |
671 | a "^", and all the in-bracket special regions are underlined with an X: | |
672 | ||
673 | "Minimum ([_1]) is larger than maximum ([_2])!\n", | |
674 | ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^ | |
675 | ||
676 | When that string is compiled from bracket notation into a real Perl sub, | |
677 | it's basically turned into: | |
678 | ||
679 | sub { | |
680 | my $lh = $_[0]; | |
681 | my @params = @_; | |
682 | return join '', | |
683 | "Minimum (", | |
684 | ...some code here... | |
685 | ") is larger than maximum (", | |
686 | ...some code here... | |
687 | ")!\n", | |
688 | } | |
689 | # to be called by $lh->maketext(KEY, params...) | |
690 | ||
691 | In other words, text outside bracket groups is turned into string | |
692 | literals. Text in brackets is rather more complex, and currently follows | |
693 | these rules: | |
694 | ||
695 | =over | |
696 | ||
697 | =item * | |
698 | ||
699 | Bracket groups that are empty, or which consist only of whitespace, | |
700 | are ignored. (Examples: "[]", "[ ]", or a [ and a ] with returns | |
701 | and/or tabs and/or spaces between them. | |
702 | ||
703 | Otherwise, each group is taken to be a comma-separated group of items, | |
704 | and each item is interpreted as follows: | |
705 | ||
706 | =item * | |
707 | ||
708 | An item that is "_I<digits>" or "_-I<digits>" is interpreted as | |
709 | $_[I<value>]. I.e., "_1" is becomes with $_[1], and "_-3" is interpreted | |
710 | as $_[-3] (in which case @_ should have at least three elements in it). | |
711 | Note that $_[0] is the language handle, and is typically not named | |
712 | directly. | |
713 | ||
714 | =item * | |
715 | ||
716 | An item "_*" is interpreted to mean "all of @_ except $_[0]". | |
717 | I.e., C<@_[1..$#_]>. Note that this is an empty list in the case | |
718 | of calls like $lh->maketext(I<key>) where there are no | |
719 | parameters (except $_[0], the language handle). | |
720 | ||
721 | =item * | |
722 | ||
723 | Otherwise, each item is interpreted as a string literal. | |
724 | ||
725 | =back | |
726 | ||
727 | The group as a whole is interpreted as follows: | |
728 | ||
729 | =over | |
730 | ||
731 | =item * | |
732 | ||
733 | If the first item in a bracket group looks like a method name, | |
734 | then that group is interpreted like this: | |
735 | ||
736 | $lh->that_method_name( | |
737 | ...rest of items in this group... | |
738 | ), | |
739 | ||
740 | =item * | |
741 | ||
ff5ad48a JH |
742 | If the first item in a bracket group is "*", it's taken as shorthand |
743 | for the so commonly called "quant" method. Similarly, if the first | |
744 | item in a bracket group is "#", it's taken to be shorthand for | |
745 | "numf". | |
746 | ||
747 | =item * | |
748 | ||
9378c581 JH |
749 | If the first item in a bracket group is empty-string, or "_*" |
750 | or "_I<digits>" or "_-I<digits>", then that group is interpreted | |
751 | as just the interpolation of all its items: | |
752 | ||
753 | join('', | |
754 | ...rest of items in this group... | |
755 | ), | |
756 | ||
757 | Examples: "[_1]" and "[,_1]", which are synonymous; and | |
f918d677 | 758 | "C<[,ID-(,_4,-,_2,)]>", which compiles as |
9378c581 JH |
759 | C<join "", "ID-(", $_[4], "-", $_[2], ")">. |
760 | ||
761 | =item * | |
762 | ||
763 | Otherwise this bracket group is invalid. For example, in the group | |
764 | "[!@#,whatever]", the first item C<"!@#"> is neither empty-string, | |
765 | "_I<number>", "_-I<number>", "_*", nor a valid method name; and so | |
766 | Locale::Maketext will throw an exception of you try compiling an | |
767 | expression containing this bracket group. | |
768 | ||
769 | =back | |
770 | ||
771 | Note, incidentally, that items in each group are comma-separated, | |
772 | not C</\s*,\s*/>-separated. That is, you might expect that this | |
773 | bracket group: | |
774 | ||
775 | "Hoohah [foo, _1 , bar ,baz]!" | |
776 | ||
777 | would compile to this: | |
778 | ||
779 | sub { | |
780 | my $lh = $_[0]; | |
781 | return join '', | |
782 | "Hoohah ", | |
783 | $lh->foo( $_[1], "bar", "baz"), | |
784 | "!", | |
785 | } | |
786 | ||
787 | But it actually compiles as this: | |
788 | ||
789 | sub { | |
790 | my $lh = $_[0]; | |
791 | return join '', | |
792 | "Hoohah ", | |
793 | $lh->foo(" _1 ", " bar ", "baz"), #!!! | |
794 | "!", | |
795 | } | |
796 | ||
797 | In the notation discussed so far, the characters "[" and "]" are given | |
798 | special meaning, for opening and closing bracket groups, and "," has | |
799 | a special meaning inside bracket groups, where it separates items in the | |
800 | group. This begs the question of how you'd express a literal "[" or | |
801 | "]" in a Bracket Notation string, and how you'd express a literal | |
802 | comma inside a bracket group. For this purpose I've adopted "~" (tilde) | |
803 | as an escape character: "~[" means a literal '[' character anywhere | |
804 | in Bracket Notation (i.e., regardless of whether you're in a bracket | |
805 | group or not), and ditto for "~]" meaning a literal ']', and "~," meaning | |
806 | a literal comma. (Altho "," means a literal comma outside of | |
807 | bracket groups -- it's only inside bracket groups that commas are special.) | |
808 | ||
809 | And on the off chance you need a literal tilde in a bracket expression, | |
810 | you get it with "~~". | |
811 | ||
812 | Currently, an unescaped "~" before a character | |
813 | other than a bracket or a comma is taken to mean just a "~" and that | |
f918d677 | 814 | character. I.e., "~X" means the same as "~~X" -- i.e., one literal tilde, |
9378c581 JH |
815 | and then one literal "X". However, by using "~X", you are assuming that |
816 | no future version of Maketext will use "~X" as a magic escape sequence. | |
817 | In practice this is not a great problem, since first off you can just | |
818 | write "~~X" and not worry about it; second off, I doubt I'll add lots | |
819 | of new magic characters to bracket notation; and third off, you | |
820 | aren't likely to want literal "~" characters in your messages anyway, | |
821 | since it's not a character with wide use in natural language text. | |
822 | ||
823 | Brackets must be balanced -- every openbracket must have | |
824 | one matching closebracket, and vice versa. So these are all B<invalid>: | |
825 | ||
826 | "I ate [quant,_1,rhubarb pie." | |
827 | "I ate [quant,_1,rhubarb pie[." | |
828 | "I ate quant,_1,rhubarb pie]." | |
829 | "I ate quant,_1,rhubarb pie[." | |
830 | ||
831 | Currently, bracket groups do not nest. That is, you B<cannot> say: | |
832 | ||
833 | "Foo [bar,baz,[quux,quuux]]\n"; | |
834 | ||
835 | If you need a notation that's that powerful, use normal Perl: | |
836 | ||
837 | %Lexicon = ( | |
838 | ... | |
839 | "some_key" => sub { | |
840 | my $lh = $_[0]; | |
841 | join '', | |
842 | "Foo ", | |
843 | $lh->bar('baz', $lh->quux('quuux')), | |
844 | "\n", | |
845 | }, | |
846 | ... | |
847 | ); | |
848 | ||
849 | Or write the "bar" method so you don't need to pass it the | |
850 | output from calling quux. | |
851 | ||
852 | I do not anticipate that you will need (or particularly want) | |
853 | to nest bracket groups, but you are welcome to email me with | |
854 | convincing (real-life) arguments to the contrary. | |
855 | ||
856 | =head1 AUTO LEXICONS | |
857 | ||
858 | If maketext goes to look in an individual %Lexicon for an entry | |
859 | for I<key> (where I<key> does not start with an underscore), and | |
860 | sees none, B<but does see> an entry of "_AUTO" => I<some_true_value>, | |
861 | then we actually define $Lexicon{I<key>} = I<key> right then and there, | |
862 | and then use that value as if it had been there all | |
863 | along. This happens before we even look in any superclass %Lexicons! | |
864 | ||
865 | (This is meant to be somewhat like the AUTOLOAD mechanism in | |
866 | Perl's function call system -- or, looked at another way, | |
867 | like the L<AutoLoader|AutoLoader> module.) | |
868 | ||
869 | I can picture all sorts of circumstances where you just | |
870 | do not want lookup to be able to fail (since failing | |
871 | normally means that maketext throws a C<die>, altho | |
872 | see the next section for greater control over that). But | |
873 | here's one circumstance where _AUTO lexicons are meant to | |
874 | be I<especially> useful: | |
875 | ||
876 | As you're writing an application, you decide as you go what messages | |
877 | you need to emit. Normally you'd go to write this: | |
878 | ||
879 | if(-e $filename) { | |
880 | go_process_file($filename) | |
881 | } else { | |
882 | print "Couldn't find file \"$filename\"!\n"; | |
883 | } | |
884 | ||
885 | but since you anticipate localizing this, you write: | |
886 | ||
887 | use ThisProject::I18N; | |
888 | my $lh = ThisProject::I18N->get_handle(); | |
889 | # For the moment, assume that things are set up so | |
890 | # that we load class ThisProject::I18N::en | |
f918d677 | 891 | # and that that's the class that $lh belongs to. |
9378c581 JH |
892 | ... |
893 | if(-e $filename) { | |
894 | go_process_file($filename) | |
895 | } else { | |
896 | print $lh->maketext( | |
897 | "Couldn't find file \"[_1]\"!\n", $filename | |
898 | ); | |
899 | } | |
900 | ||
901 | Now, right after you've just written the above lines, you'd | |
902 | normally have to go open the file | |
903 | ThisProject/I18N/en.pm, and immediately add an entry: | |
904 | ||
905 | "Couldn't find file \"[_1]\"!\n" | |
906 | => "Couldn't find file \"[_1]\"!\n", | |
907 | ||
908 | But I consider that somewhat of a distraction from the work | |
909 | of getting the main code working -- to say nothing of the fact | |
910 | that I often have to play with the program a few times before | |
911 | I can decide exactly what wording I want in the messages (which | |
912 | in this case would require me to go changing three lines of code: | |
913 | the call to maketext with that key, and then the two lines in | |
914 | ThisProject/I18N/en.pm). | |
915 | ||
916 | However, if you set "_AUTO => 1" in the %Lexicon in, | |
917 | ThisProject/I18N/en.pm (assuming that English (en) is | |
918 | the language that all your programmers will be using for this | |
919 | project's internal message keys), then you don't ever have to | |
920 | go adding lines like this | |
921 | ||
922 | "Couldn't find file \"[_1]\"!\n" | |
923 | => "Couldn't find file \"[_1]\"!\n", | |
924 | ||
925 | to ThisProject/I18N/en.pm, because if _AUTO is true there, | |
926 | then just looking for an entry with the key "Couldn't find | |
927 | file \"[_1]\"!\n" in that lexicon will cause it to be added, | |
928 | with that value! | |
929 | ||
930 | Note that the reason that keys that start with "_" | |
931 | are immune to _AUTO isn't anything generally magical about | |
932 | the underscore character -- I just wanted a way to have most | |
933 | lexicon keys be autoable, except for possibly a few, and I | |
934 | arbitrarily decided to use a leading underscore as a signal | |
935 | to distinguish those few. | |
936 | ||
937 | =head1 CONTROLLING LOOKUP FAILURE | |
938 | ||
939 | If you call $lh->maketext(I<key>, ...parameters...), | |
940 | and there's no entry I<key> in $lh's class's %Lexicon, nor | |
941 | in the superclass %Lexicon hash, I<and> if we can't auto-make | |
942 | I<key> (because either it starts with a "_", or because none | |
943 | of its lexicons have C<_AUTO =E<gt> 1,>), then we have | |
944 | failed to find a normal way to maketext I<key>. What then | |
945 | happens in these failure conditions, depends on the $lh object | |
946 | "fail" attribute. | |
947 | ||
948 | If the language handle has no "fail" attribute, maketext | |
949 | will simply throw an exception (i.e., it calls C<die>, mentioning | |
950 | the I<key> whose lookup failed, and naming the line number where | |
951 | the calling $lh->maketext(I<key>,...) was. | |
952 | ||
953 | If the language handle has a "fail" attribute whose value is a | |
954 | coderef, then $lh->maketext(I<key>,...params...) gives up and calls: | |
955 | ||
956 | return &{$that_subref}($lh, $key, @params); | |
957 | ||
958 | Otherwise, the "fail" attribute's value should be a string denoting | |
959 | a method name, so that $lh->maketext(I<key>,...params...) can | |
960 | give up with: | |
961 | ||
962 | return $lh->$that_method_name($phrase, @params); | |
963 | ||
964 | The "fail" attribute can be accessed with the C<fail_with> method: | |
965 | ||
966 | # Set to a coderef: | |
967 | $lh->fail_with( \&failure_handler ); | |
968 | ||
969 | # Set to a method name: | |
970 | $lh->fail_with( 'failure_method' ); | |
971 | ||
972 | # Set to nothing (i.e., so failure throws a plain exception) | |
973 | $lh->fail_with( undef ); | |
974 | ||
975 | # Simply read: | |
976 | $handler = $lh->fail_with(); | |
977 | ||
978 | Now, as to what you may want to do with these handlers: Maybe you'd | |
979 | want to log what key failed for what class, and then die. Maybe | |
980 | you don't like C<die> and instead you want to send the error message | |
981 | to STDOUT (or wherever) and then merely C<exit()>. | |
982 | ||
983 | Or maybe you don't want to C<die> at all! Maybe you could use a | |
984 | handler like this: | |
985 | ||
986 | # Make all lookups fall back onto an English value, | |
987 | # but after we log it for later fingerpointing. | |
988 | my $lh_backup = ThisProject->get_handle('en'); | |
989 | open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!"; | |
990 | sub lex_fail { | |
991 | my($failing_lh, $key, $params) = @_; | |
992 | print LEX_FAIL_LOG scalar(localtime), "\t", | |
993 | ref($failing_lh), "\t", $key, "\n"; | |
994 | return $lh_backup->maketext($key,@params); | |
995 | } | |
996 | ||
997 | Some users have expressed that they think this whole mechanism of | |
998 | having a "fail" attribute at all, seems a rather pointless complication. | |
999 | But I want Locale::Maketext to be usable for software projects of I<any> | |
1000 | scale and type; and different software projects have different ideas | |
1001 | of what the right thing is to do in failure conditions. I could simply | |
1002 | say that failure always throws an exception, and that if you want to be | |
1003 | careful, you'll just have to wrap every call to $lh->maketext in an | |
1004 | S<eval { }>. However, I want programmers to reserve the right (via | |
1005 | the "fail" attribute) to treat lookup failure as something other than | |
1006 | an exception of the same level of severity as a config file being | |
f918d677 | 1007 | unreadable, or some essential resource being inaccessible. |
9378c581 JH |
1008 | |
1009 | One possibly useful value for the "fail" attribute is the method name | |
1010 | "failure_handler_auto". This is a method defined in class | |
1011 | Locale::Maketext itself. You set it with: | |
1012 | ||
1013 | $lh->fail_with('failure_handler_auto'); | |
1014 | ||
1015 | Then when you call $lh->maketext(I<key>, ...parameters...) and | |
1016 | there's no I<key> in any of those lexicons, maketext gives up with | |
1017 | ||
1018 | return $lh->failure_handler_auto($key, @params); | |
1019 | ||
1020 | But failure_handler_auto, instead of dying or anything, compiles | |
1021 | $key, caching it in $lh->{'failure_lex'}{$key} = $complied, | |
1022 | and then calls the compiled value, and returns that. (I.e., if | |
1023 | $key looks like bracket notation, $compiled is a sub, and we return | |
1024 | &{$compiled}(@params); but if $key is just a plain string, we just | |
1025 | return that.) | |
1026 | ||
1027 | The effect of using "failure_auto_handler" | |
1028 | is like an AUTO lexicon, except that it 1) compiles $key even if | |
1029 | it starts with "_", and 2) you have a record in the new hashref | |
1030 | $lh->{'failure_lex'} of all the keys that have failed for | |
1031 | this object. This should avoid your program dying -- as long | |
1032 | as your keys aren't actually invalid as bracket code, and as | |
1033 | long as they don't try calling methods that don't exist. | |
1034 | ||
1035 | "failure_auto_handler" may not be exactly what you want, but I | |
1036 | hope it at least shows you that maketext failure can be mitigated | |
1037 | in any number of very flexible ways. If you can formalize exactly | |
1038 | what you want, you should be able to express that as a failure | |
1039 | handler. You can even make it default for every object of a given | |
1040 | class, by setting it in that class's init: | |
1041 | ||
1042 | sub init { | |
1043 | my $lh = $_[0]; # a newborn handle | |
1044 | $lh->SUPER::init(); | |
1045 | $lh->fail_with('my_clever_failure_handler'); | |
1046 | return; | |
1047 | } | |
1048 | sub my_clever_failure_handler { | |
1049 | ...you clever things here... | |
1050 | } | |
1051 | ||
1052 | =head1 HOW TO USE MAKETEXT | |
1053 | ||
1054 | Here is a brief checklist on how to use Maketext to localize | |
1055 | applications: | |
1056 | ||
1057 | =over | |
1058 | ||
1059 | =item * | |
1060 | ||
1061 | Decide what system you'll use for lexicon keys. If you insist, | |
1062 | you can use opaque IDs (if you're nostalgic for C<catgets>), | |
1063 | but I have better suggestions in the | |
1064 | section "Entries in Each Lexicon", above. Assuming you opt for | |
1065 | meaningful keys that double as values (like "Minimum ([_1]) is | |
1066 | larger than maximum ([_2])!\n"), you'll have to settle on what | |
1067 | language those should be in. For the sake of argument, I'll | |
1068 | call this English, specifically American English, "en-US". | |
1069 | ||
1070 | =item * | |
1071 | ||
1072 | Create a class for your localization project. This is | |
1073 | the name of the class that you'll use in the idiom: | |
1074 | ||
1075 | use Projname::L10N; | |
1076 | my $lh = Projname::L10N->get_handle(...) || die "Language?"; | |
1077 | ||
1078 | Assuming your call your class Projname::L10N, create a class | |
1079 | consisting minimally of: | |
1080 | ||
1081 | package Projname::L10N; | |
1082 | use base qw(Locale::Maketext); | |
1083 | ...any methods you might want all your languages to share... | |
1084 | ||
1085 | # And, assuming you want the base class to be an _AUTO lexicon, | |
1086 | # as is discussed a few sections up: | |
1087 | ||
1088 | 1; | |
1089 | ||
1090 | =item * | |
1091 | ||
1092 | Create a class for the language your internal keys are in. Name | |
1093 | the class after the language-tag for that language, in lowercase, | |
1094 | with dashes changed to underscores. Assuming your project's first | |
1095 | language is US English, you should call this Projname::L10N::en_us. | |
1096 | It should consist minimally of: | |
1097 | ||
1098 | package Projname::L10N::en_us; | |
1099 | use base qw(Projname::L10N); | |
1100 | %Lexicon = ( | |
1101 | '_AUTO' => 1, | |
1102 | ); | |
1103 | 1; | |
1104 | ||
1105 | (For the rest of this section, I'll assume that this "first | |
1106 | language class" of Projname::L10N::en_us has | |
1107 | _AUTO lexicon.) | |
1108 | ||
1109 | =item * | |
1110 | ||
1111 | Go and write your program. Everywhere in your program where | |
1112 | you would say: | |
1113 | ||
1114 | print "Foobar $thing stuff\n"; | |
1115 | ||
1116 | instead do it thru maketext, using no variable interpolation in | |
1117 | the key: | |
1118 | ||
1119 | print $lh->maketext("Foobar [_1] stuff\n", $thing); | |
1120 | ||
1121 | If you get tired of constantly saying C<print $lh-E<gt>maketext>, | |
1122 | consider making a functional wrapper for it, like so: | |
1123 | ||
1124 | use Projname::L10N; | |
1125 | use vars qw($lh); | |
1126 | $lh = Projname::L10N->get_handle(...) || die "Language?"; | |
1127 | sub pmt (@) { print( $lh->maketext(@_)) } | |
1128 | # "pmt" is short for "Print MakeText" | |
1129 | $Carp::Verbose = 1; | |
1130 | # so if maketext fails, we see made the call to pmt | |
1131 | ||
1132 | Besides whole phrases meant for output, anything language-dependent | |
1133 | should be put into the class Projname::L10N::en_us, | |
1134 | whether as methods, or as lexicon entries -- this is discussed | |
1135 | in the section "Entries in Each Lexicon", above. | |
1136 | ||
1137 | =item * | |
1138 | ||
1139 | Once the program is otherwise done, and once its localization for | |
1140 | the first language works right (via the data and methods in | |
1141 | Projname::L10N::en_us), you can get together the data for translation. | |
1142 | If your first language lexicon isn't an _AUTO lexicon, then you already | |
1143 | have all the messages explicitly in the lexicon (or else you'd be | |
1144 | getting exceptions thrown when you call $lh->maketext to get | |
1145 | messages that aren't in there). But if you were (advisedly) lazy and are | |
1146 | using an _AUTO lexicon, then you've got to make a list of all the phrases | |
1147 | that you've so far been letting _AUTO generate for you. There are very | |
1148 | many ways to assemble such a list. The most straightforward is to simply | |
1149 | grep the source for every occurrence of "maketext" (or calls | |
1150 | to wrappers around it, like the above C<pmt> function), and to log the | |
1151 | following phrase. | |
1152 | ||
1153 | =item * | |
1154 | ||
1155 | You may at this point want to consider whether the your base class | |
1156 | (Projname::L10N) that all lexicons inherit from (Projname::L10N::en, | |
1157 | Projname::L10N::es, etc.) should be an _AUTO lexicon. It may be true | |
1158 | that in theory, all needed messages will be in each language class; | |
1159 | but in the presumably unlikely or "impossible" case of lookup failure, | |
1160 | you should consider whether your program should throw an exception, | |
1161 | emit text in English (or whatever your project's first language is), | |
1162 | or some more complex solution as described in the section | |
1163 | "Controlling Lookup Failure", above. | |
1164 | ||
1165 | =item * | |
1166 | ||
1167 | Submit all messages/phrases/etc. to translators. | |
1168 | ||
1169 | (You may, in fact, want to start with localizing to I<one> other language | |
1170 | at first, if you're not sure that you've property abstracted the | |
1171 | language-dependent parts of your code.) | |
1172 | ||
1173 | Translators may request clarification of the situation in which a | |
1174 | particular phrase is found. For example, in English we are entirely happy | |
1175 | saying "I<n> files found", regardless of whether we mean "I looked for files, | |
1176 | and found I<n> of them" or the rather distinct situation of "I looked for | |
1177 | something else (like lines in files), and along the way I saw I<n> | |
1178 | files." This may involve rethinking things that you thought quite clear: | |
1179 | should "Edit" on a toolbar be a noun ("editing") or a verb ("to edit")? Is | |
1180 | there already a conventionalized way to express that menu option, separate | |
1181 | from the target language's normal word for "to edit"? | |
1182 | ||
1183 | In all cases where the very common phenomenon of quantification | |
1184 | (saying "I<N> files", for B<any> value of N) | |
1185 | is involved, each translator should make clear what dependencies the | |
1186 | number causes in the sentence. In many cases, dependency is | |
1187 | limited to words adjacent to the number, in places where you might | |
1188 | expect them ("I found the-?PLURAL I<N> | |
1189 | empty-?PLURAL directory-?PLURAL"), but in some cases there are | |
1190 | unexpected dependencies ("I found-?PLURAL ..."!) as well as long-distance | |
1191 | dependencies "The I<N> directory-?PLURAL could not be deleted-?PLURAL"!). | |
1192 | ||
1193 | Remind the translators to consider the case where N is 0: | |
1194 | "0 files found" isn't exactly natural-sounding in any language, but it | |
1195 | may be unacceptable in many -- or it may condition special | |
1196 | kinds of agreement (similar to English "I didN'T find ANY files"). | |
1197 | ||
1198 | Remember to ask your translators about numeral formatting in their | |
1199 | language, so that you can override the C<numf> method as | |
1200 | appropriate. Typical variables in number formatting are: what to | |
1201 | use as a decimal point (comma? period?); what to use as a thousands | |
f918d677 | 1202 | separator (space? nonbreaking space? comma? period? small |
9378c581 JH |
1203 | middot? prime? apostrophe?); and even whether the so-called "thousands |
1204 | separator" is actually for every third digit -- I've heard reports of | |
f918d677 | 1205 | two hundred thousand being expressible as "2,00,000" for some Indian |
9378c581 JH |
1206 | (Subcontinental) languages, besides the less surprising "S<200 000>", |
1207 | "200.000", "200,000", and "200'000". Also, using a set of numeral | |
1208 | glyphs other than the usual ASCII "0"-"9" might be appreciated, as via | |
1209 | C<tr/0-9/\x{0966}-\x{096F}/> for getting digits in Devanagari script | |
1210 | (for Hindi, Konkani, others). | |
1211 | ||
1212 | The basic C<quant> method that Locale::Maketext provides should be | |
1213 | good for many languages. For some languages, it might be useful | |
1214 | to modify it (or its constituent C<numerate> method) | |
1215 | to take a plural form in the two-argument call to C<quant> | |
1216 | (as in "[quant,_1,files]") if | |
1217 | it's all-around easier to infer the singular form from the plural, than | |
1218 | to infer the plural form from the singular. | |
1219 | ||
1220 | But for other languages (as is discussed at length | |
1221 | in L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>), simple | |
1222 | C<quant>/C<numerify> is not enough. For the particularly problematic | |
1223 | Slavic languages, what you may need is a method which you provide | |
1224 | with the number, the citation form of the noun to quantify, and | |
1225 | the case and gender that the sentence's syntax projects onto that | |
1226 | noun slot. The method would then be responsible for determining | |
1227 | what grammatical number that numeral projects onto its noun phrase, | |
1228 | and what case and gender it may override the normal case and gender | |
1229 | with; and then it would look up the noun in a lexicon providing | |
1230 | all needed inflected forms. | |
1231 | ||
1232 | =item * | |
1233 | ||
1234 | You may also wish to discuss with the translators the question of | |
1235 | how to relate different subforms of the same language tag, | |
1236 | considering how this reacts with C<get_handle>'s treatment of | |
1237 | these. For example, if a user accepts interfaces in "en, fr", and | |
1238 | you have interfaces available in "en-US" and "fr", what should | |
1239 | they get? You may wish to resolve this by establishing that "en" | |
1240 | and "en-US" are effectively synonymous, by having one class | |
1241 | zero-derive from the other. | |
1242 | ||
1243 | For some languages this issue may never come up (Danish is rarely | |
1244 | expressed as "da-DK", but instead is just "da"). And for other | |
1245 | languages, the whole concept of a "generic" form may verge on | |
1246 | being uselessly vague, particularly for interfaces involving voice | |
1247 | media in forms of Arabic or Chinese. | |
1248 | ||
1249 | =item * | |
1250 | ||
1251 | Once you've localized your program/site/etc. for all desired | |
1252 | languages, be sure to show the result (whether live, or via | |
1253 | screenshots) to the translators. Once they approve, make every | |
1254 | effort to have it then checked by at least one other speaker of | |
1255 | that language. This holds true even when (or especially when) the | |
1256 | translation is done by one of your own programmers. Some | |
1257 | kinds of systems may be harder to find testers for than others, | |
1258 | depending on the amount of domain-specific jargon and concepts | |
1259 | involved -- it's easier to find people who can tell you whether | |
1260 | they approve of your translation for "delete this message" in an | |
1261 | email-via-Web interface, than to find people who can give you | |
1262 | an informed opinion on your translation for "attribute value" | |
1263 | in an XML query tool's interface. | |
1264 | ||
1265 | =back | |
1266 | ||
1267 | =head1 SEE ALSO | |
1268 | ||
1269 | I recommend reading all of these: | |
1270 | ||
1271 | L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13> -- my I<The Perl | |
1272 | Journal> article about Maketext. It explains many important concepts | |
1273 | underlying Locale::Maketext's design, and some insight into why | |
1274 | Maketext is better than the plain old approach of just having | |
1275 | message catalogs that are just databases of sprintf formats. | |
1276 | ||
1277 | L<File::Findgrep|File::Findgrep> is a sample application/module | |
f918d677 JH |
1278 | that uses Locale::Maketext to localize its messages. For a larger |
1279 | internationalized system, see also L<Apache::MP3>. | |
9378c581 JH |
1280 | |
1281 | L<I18N::LangTags|I18N::LangTags>. | |
1282 | ||
1283 | L<Win32::Locale|Win32::Locale>. | |
1284 | ||
1285 | RFC 3066, I<Tags for the Identification of Languages>, | |
1286 | as at http://sunsite.dk/RFC/rfc/rfc3066.html | |
1287 | ||
1288 | RFC 2277, I<IETF Policy on Character Sets and Languages> | |
1289 | is at http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is | |
1290 | just things of interest to protocol designers, but it explains | |
1291 | some basic concepts, like the distinction between locales and | |
1292 | language-tags. | |
1293 | ||
1294 | The manual for GNU C<gettext>. The gettext dist is available in | |
1295 | C<ftp://prep.ai.mit.edu/pub/gnu/> -- get | |
1296 | a recent gettext tarball and look in its "doc/" directory, there's | |
1297 | an easily browsable HTML version in there. The | |
1298 | gettext documentation asks lots of questions worth thinking | |
1299 | about, even if some of their answers are sometimes wonky, | |
1300 | particularly where they start talking about pluralization. | |
1301 | ||
1302 | The Locale/Maketext.pm source. Obverse that the module is much | |
1303 | shorter than its documentation! | |
1304 | ||
1305 | =head1 COPYRIGHT AND DISCLAIMER | |
1306 | ||
f918d677 | 1307 | Copyright (c) 1999-2003 Sean M. Burke. All rights reserved. |
9378c581 JH |
1308 | |
1309 | This library is free software; you can redistribute it and/or modify | |
1310 | it under the same terms as Perl itself. | |
1311 | ||
1312 | This program is distributed in the hope that it will be useful, but | |
1313 | without any warranty; without even the implied warranty of | |
1314 | merchantability or fitness for a particular purpose. | |
1315 | ||
1316 | =head1 AUTHOR | |
1317 | ||
1318 | Sean M. Burke C<sburke@cpan.org> | |
1319 | ||
1320 | =cut |