X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/38f4139d1bbafce6f6d3a31d480780b96ab0ff5b..4364919aeabf66edaa6fb40631f8fed89f4bcfe2:/pod/perlunicode.pod diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index b193273..f00b110 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -260,11 +260,12 @@ complement B the full character-wide bit complement. =item * -You can define your own mappings to be used in C, -C, C, and C (or their double-quoted string inlined -versions such as C<\U>). See -L -for more details. +There is a CPAN module, L, which allows you to define +your own mappings to be used in C, C, C, and +C (or their double-quoted string inlined versions such as +C<\U>). (Prior to Perl 5.16, this functionality was partially provided +in the Perl core, but suffered from a number of insurmountable +drawbacks, so the CPAN module was written instead.) =back @@ -301,7 +302,8 @@ This formality is needed when properties are not binary; that is, if they can take on more values than just True and False. For example, the Bidi_Class (see L below), can take on several different values, such as Left, Right, Whitespace, and others. To match these, one needs -to specify the property name (Bidi_Class), AND the value being matched against +to specify both the property name (Bidi_Class), AND the value being +matched against (Left, Right, etc.). This is done, as in the examples above, by having the two components separated by an equal sign (or interchangeably, a colon), like C<\p{Bidi_Class: Left}>. @@ -469,11 +471,63 @@ The world's languages are written in many different scripts. This sentence written in Cyrillic, and Greek is written in, well, Greek; Japanese mainly in Hiragana or Katakana. There are many more. -The Unicode Script property gives what script a given character is in, -and the property can be specified with the compound form like -C<\p{Script=Hebrew}> (short: C<\p{sc=hebr}>). Perl furnishes shortcuts for all -script names. You can omit everything up through the equals (or colon), and -simply write C<\p{Latin}> or C<\P{Cyrillic}>. +The Unicode Script and Script_Extensions properties give what script a +given character is in. Either property can be specified with the +compound form like +C<\p{Script=Hebrew}> (short: C<\p{sc=hebr}>), or +C<\p{Script_Extensions=Javanese}> (short: C<\p{scx=java}>). +In addition, Perl furnishes shortcuts for all +C