+Named Unicode properties, scripts, and block ranges may be used (like bracketed
+character classes) by using the C<\p{}> "matches property" construct and
+the C<\P{}> negation, "doesn't match property".
+See L</"Unicode Character Properties"> for more details.
+
+You can define your own character properties and use them
+in the regular expression with the C<\p{}> or C<\P{}> construct.
+See L</"User-Defined Character Properties"> for more details.
+
+=item *
+
+The special pattern C<\X> matches a logical character, an "extended grapheme
+cluster" in Standardese. In Unicode what appears to the user to be a single
+character, for example an accented C<G>, may in fact be composed of a sequence
+of characters, in this case a C<G> followed by an accent character. C<\X>
+will match the entire sequence.
+
+=item *
+
+The C<tr///> operator translates characters instead of bytes. Note
+that the C<tr///CU> functionality has been removed. For similar
+functionality see pack('U0', ...) and pack('C0', ...).
+
+=item *
+
+Case translation operators use the Unicode case translation tables
+when character input is provided. Note that C<uc()>, or C<\U> in
+interpolated strings, translates to uppercase, while C<ucfirst>,
+or C<\u> in interpolated strings, translates to titlecase in languages
+that make the distinction (which is equivalent to uppercase in languages
+without the distinction).
+
+=item *
+
+Most operators that deal with positions or lengths in a string will
+automatically switch to using character positions, including
+C<chop()>, C<chomp()>, C<substr()>, C<pos()>, C<index()>, C<rindex()>,
+C<sprintf()>, C<write()>, and C<length()>. An operator that
+specifically does not switch is C<vec()>. Operators that really don't
+care include operators that treat strings as a bucket of bits such as
+C<sort()>, and operators dealing with filenames.
+
+=item *
+
+The C<pack()>/C<unpack()> letter C<C> does I<not> change, since it is often
+used for byte-oriented formats. Again, think C<char> in the C language.
+
+There is a new C<U> specifier that converts between Unicode characters
+and code points. There is also a C<W> specifier that is the equivalent of
+C<chr>/C<ord> and properly handles character values even if they are above 255.
+
+=item *
+
+The C<chr()> and C<ord()> functions work on characters, similar to
+C<pack("W")> and C<unpack("W")>, I<not> C<pack("C")> and
+C<unpack("C")>. C<pack("C")> and C<unpack("C")> are methods for
+emulating byte-oriented C<chr()> and C<ord()> on Unicode strings.
+While these methods reveal the internal encoding of Unicode strings,
+that is not something one normally needs to care about at all.
+
+=item *
+
+The bit string operators, C<& | ^ ~>, can operate on character data.
+However, for backward compatibility, such as when using bit string
+operations when characters are all less than 256 in ordinal value, one
+should not use C<~> (the bit complement) with characters of both
+values less than 256 and values greater than 256. Most importantly,
+DeMorgan's laws (C<~($x|$y) eq ~$x&~$y> and C<~($x&$y) eq ~$x|~$y>)
+will not hold. The reason for this mathematical I<faux pas> is that
+the complement cannot return B<both> the 8-bit (byte-wide) bit
+complement B<and> the full character-wide bit complement.
+
+=item *
+
+There is a CPAN module, L<Unicode::Casing>, which allows you to define
+your own mappings to be used in C<lc()>, C<lcfirst()>, C<uc()>, and
+C<ucfirst()> (or their double-quoted string inlined versions such as
+C<\U>). (Prior to Perl 5.16, this functionality was partially provided
+in the Perl core, but suffered from a number of insurmountable
+drawbacks, so the CPAN module was written instead.)
+
+=back
+
+=over 4
+
+=item *
+
+And finally, C<scalar reverse()> reverses by character rather than by byte.
+
+=back
+
+=head2 Unicode Character Properties
+
+(The only time that Perl considers a sequence of individual code
+points as a single logical character is in the C<\X> construct, already
+mentioned above. Therefore "character" in this discussion means a single
+Unicode code point.)
+
+Very nearly all Unicode character properties are accessible through
+regular expressions by using the C<\p{}> "matches property" construct
+and the C<\P{}> "doesn't match property" for its negation.
+
+For instance, C<\p{Uppercase}> matches any single character with the Unicode
+"Uppercase" property, while C<\p{L}> matches any character with a
+General_Category of "L" (letter) property. Brackets are not
+required for single letter property names, so C<\p{L}> is equivalent to C<\pL>.
+
+More formally, C<\p{Uppercase}> matches any single character whose Unicode
+Uppercase property value is True, and C<\P{Uppercase}> matches any character
+whose Uppercase property value is False, and they could have been written as
+C<\p{Uppercase=True}> and C<\p{Uppercase=False}>, respectively.
+
+This formality is needed when properties are not binary; that is, if they can
+take on more values than just True and False. For example, the Bidi_Class (see
+L</"Bidirectional Character Types"> below), can take on several different
+values, such as Left, Right, Whitespace, and others. To match these, one needs
+to specify both the property name (Bidi_Class), AND the value being
+matched against
+(Left, Right, etc.). This is done, as in the examples above, by having the
+two components separated by an equal sign (or interchangeably, a colon), like
+C<\p{Bidi_Class: Left}>.
+
+All Unicode-defined character properties may be written in these compound forms
+of C<\p{property=value}> or C<\p{property:value}>, but Perl provides some
+additional properties that are written only in the single form, as well as
+single-form short-cuts for all binary properties and certain others described
+below, in which you may omit the property name and the equals or colon
+separator.
+
+Most Unicode character properties have at least two synonyms (or aliases if you
+prefer): a short one that is easier to type and a longer one that is more
+descriptive and hence easier to understand. Thus the "L" and "Letter" properties
+above are equivalent and can be used interchangeably. Likewise,
+"Upper" is a synonym for "Uppercase", and we could have written
+C<\p{Uppercase}> equivalently as C<\p{Upper}>. Also, there are typically
+various synonyms for the values the property can be. For binary properties,
+"True" has 3 synonyms: "T", "Yes", and "Y"; and "False has correspondingly "F",
+"No", and "N". But be careful. A short form of a value for one property may
+not mean the same thing as the same short form for another. Thus, for the
+General_Category property, "L" means "Letter", but for the Bidi_Class property,
+"L" means "Left". A complete list of properties and synonyms is in
+L<perluniprops>.
+
+Upper/lower case differences in property names and values are irrelevant;
+thus C<\p{Upper}> means the same thing as C<\p{upper}> or even C<\p{UpPeR}>.
+Similarly, you can add or subtract underscores anywhere in the middle of a
+word, so that these are also equivalent to C<\p{U_p_p_e_r}>. And white space
+is irrelevant adjacent to non-word characters, such as the braces and the equals
+or colon separators, so C<\p{ Upper }> and C<\p{ Upper_case : Y }> are
+equivalent to these as well. In fact, white space and even
+hyphens can usually be added or deleted anywhere. So even C<\p{ Up-per case = Yes}> is
+equivalent. All this is called "loose-matching" by Unicode. The few places
+where stricter matching is used is in the middle of numbers, and in the Perl
+extension properties that begin or end with an underscore. Stricter matching
+cares about white space (except adjacent to non-word characters),
+hyphens, and non-interior underscores.