=item *
-Regular expression look-ahead
+Regular expression lookahead
You can mimic class subtraction using lookahead.
For example, what UTS#18 might write as
empty line between C<\r> and C<\n>). For C<CRLF>, try the C<:crlf>
layer (see L<PerlIO>).
-=item [9] But C<L<Unicode::LineBreak>> is available.
+=item [9] But C<qr/\b{lb}/> and C<L<Unicode::LineBreak>> are available.
-This module supplies line breaking conformant with
+L<C<qrE<sol>\b{lb}E<sol>>|perlrebackslash/\b{lb}> supplies default line
+breaking conformant with
L<UAX#14 "Unicode Line Breaking Algorithm"|http://www.unicode.org/reports/tr14>.
+And, the module C<L<Unicode::LineBreak>> also conformant with UAX#14,
+provides customizable line breaking.
+
=item [10]
UTF-8/UTF-EBDDIC used in Perl allows not only C<U+10000> to
C<U+10FFFF> but also beyond C<U+10FFFF>
[17] see UAX#10 "Unicode Collation Algorithms"
[18] have Unicode::Collate but not integrated to regexes
- [19] have (?<=x) and (?=x), but look-aheads or look-behinds
+ [19] have (?<=x) and (?=x), but lookaheads or lookbehinds
should see outside of the target substring
[20] need insensitive matching for linguistic features other
than case; for example, hiragana to katakana, wide and
and has extended that up to 13 bytes to encode code points up to what
can fit in a 64-bit word. However, Perl will warn if you output any of
these as being non-portable; and under strict UTF-8 input protocols,
-they are forbidden.
+they are forbidden. In addition, it is deprecated to use a code point
+larger than what a signed integer variable on your system can hold. On
+32-bit ASCII systems, this means C<0x7FFF_FFFF> is the legal maximum
+going forward (much higher on 64-bit systems).
=item *
Perl by default comes with the latest supported Unicode version built-in, but
the goal is to allow you to change to use any earlier one. In Perls
v5.20 and v5.22, however, the earliest usable version is Unicode 5.1.
-Perl v5.18 is able to handle all earlier versions.
+Perl v5.18 and v5.24 are able to handle all earlier versions.
Download the files in the desired version of Unicode from the Unicode web
site L<http://www.unicode.org>). These should replace the existing files in