perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlrebackslash - Perl Regular Expression Backslash Sequences and Escapes
	4
	5	=head1 DESCRIPTION
	6
	7	The top level documentation about Perl regular expressions
	8	is found in L<perlre>.
	9
	10	This document describes all backslash and escape sequences. After
	11	explaining the role of the backslash, it lists all the sequences that have
	12	a special meaning in Perl regular expressions (in alphabetical order),
	13	then describes each of them.
	14
	15	Most sequences are described in detail in different documents; the primary
	16	purpose of this document is to have a quick reference guide describing all
	17	backslash and escape sequences.
	18
	19	=head2 The backslash
	20
	21	In a regular expression, the backslash can perform one of two tasks:
	22	it either takes away the special meaning of the character following it
	23	(for instance, C<\\|> matches a vertical bar, it's not an alternation),
	24	or it is the start of a backslash or escape sequence.
	25
	26	The rules determining what it is are quite simple: if the character
	27	following the backslash is an ASCII punctuation (non-word) character (that is,
	28	anything that is not a letter, digit, or underscore), then the backslash just
	29	takes away any special meaning of the character following it.
	30
	31	If the character following the backslash is an ASCII letter or an ASCII digit,
	32	then the sequence may be special; if so, it's listed below. A few letters have
	33	not been used yet, so escaping them with a backslash doesn't change them to be
	34	special. A future version of Perl may assign a special meaning to them, so if
	35	you have warnings turned on, Perl issues a warning if you use such a
	36	sequence. [1].
	37
	38	It is however guaranteed that backslash or escape sequences never have a
	39	punctuation character following the backslash, not now, and not in a future
	40	version of Perl 5. So it is safe to put a backslash in front of a non-word
	41	character.
	42
	43	Note that the backslash itself is special; if you want to match a backslash,
	44	you have to escape the backslash with a backslash: C</\\/> matches a single
	45	backslash.
	46
	47	=over 4
	48
	49	=item [1]
	50
	51	There is one exception. If you use an alphanumeric character as the
	52	delimiter of your pattern (which you probably shouldn't do for readability
	53	reasons), you have to escape the delimiter if you want to match
	54	it. Perl won't warn then. See also L<perlop/Gory details of parsing
	55	quoted constructs>.
	56
	57	=back
	58
	59
	60	=head2 All the sequences and escapes
	61
	62	Those not usable within a bracketed character class (like C<[\da-z]>) are marked
	63	as C<Not in [].>
	64
	65	\000 Octal escape sequence. See also \o{}.
	66	\1 Absolute backreference. Not in [].
	67	\a Alarm or bell.
	68	\A Beginning of string. Not in [].
	69	\b Word/non-word boundary. (Backspace in []).
	70	\B Not a word/non-word boundary. Not in [].
	71	\cX Control-X
	72	\C Single octet, even under UTF-8. Not in [].
	73	\d Character class for digits.
	74	\D Character class for non-digits.
	75	\e Escape character.
	76	\E Turn off \Q, \L and \U processing. Not in [].
	77	\f Form feed.
	78	\F Foldcase till \E. Not in [].
	79	\g{}, \g1 Named, absolute or relative backreference. Not in []
	80	\G Pos assertion. Not in [].
	81	\h Character class for horizontal whitespace.
	82	\H Character class for non horizontal whitespace.
	83	\k{}, \k<>, \k'' Named backreference. Not in [].
	84	\K Keep the stuff left of \K. Not in [].
	85	\l Lowercase next character. Not in [].
	86	\L Lowercase till \E. Not in [].
	87	\n (Logical) newline character.
	88	\N Any character but newline. Experimental. Not in [].
	89	\N{} Named or numbered (Unicode) character or sequence.
	90	\o{} Octal escape sequence.
	91	\p{}, \pP Character with the given Unicode property.
	92	\P{}, \PP Character without the given Unicode property.
	93	\Q Quote (disable) pattern metacharacters till \E. Not
	94	in [].
	95	\r Return character.
	96	\R Generic new line. Not in [].
	97	\s Character class for whitespace.
	98	\S Character class for non whitespace.
	99	\t Tab character.
	100	\u Titlecase next character. Not in [].
	101	\U Uppercase till \E. Not in [].
	102	\v Character class for vertical whitespace.
	103	\V Character class for non vertical whitespace.
	104	\w Character class for word characters.
	105	\W Character class for non-word characters.
	106	\x{}, \x00 Hexadecimal escape sequence.
	107	\X Unicode "extended grapheme cluster". Not in [].
	108	\z End of string. Not in [].
	109	\Z End of string. Not in [].
	110
	111	=head2 Character Escapes
	112
	113	=head3 Fixed characters
	114
	115	A handful of characters have a dedicated I<character escape>. The following
	116	table shows them, along with their ASCII code points (in decimal and hex),
	117	their ASCII name, the control escape on ASCII platforms and a short
	118	description. (For EBCDIC platforms, see L<perlebcdic/OPERATOR DIFFERENCES>.)
	119
	120	Seq. Code Point ASCII Cntrl Description.
	121	Dec Hex
	122	\a 7 07 BEL \cG alarm or bell
	123	\b 8 08 BS \cH backspace [1]
	124	\e 27 1B ESC \c[ escape character
	125	\f 12 0C FF \cL form feed
	126	\n 10 0A LF \cJ line feed [2]
	127	\r 13 0D CR \cM carriage return
	128	\t 9 09 TAB \cI tab
	129
	130	=over 4
	131
	132	=item [1]
	133
	134	C<\b> is the backspace character only inside a character class. Outside a
	135	character class, C<\b> is a word/non-word boundary.
	136
	137	=item [2]
	138
	139	C<\n> matches a logical newline. Perl converts between C<\n> and your
	140	OS's native newline character when reading from or writing to text files.
	141
	142	=back
	143
	144	=head4 Example
	145
	146	$str =~ /\t/; # Matches if $str contains a (horizontal) tab.
	147
	148	=head3 Control characters
	149
	150	C<\c> is used to denote a control character; the character following C<\c>
	151	determines the value of the construct. For example the value of C<\cA> is
	152	C<chr(1)>, and the value of C<\cb> is C<chr(2)>, etc.
	153	The gory details are in L<perlop/"Regexp Quote-Like Operators">. A complete
	154	list of what C<chr(1)>, etc. means for ASCII and EBCDIC platforms is in
	155	L<perlebcdic/OPERATOR DIFFERENCES>.
	156
	157	Note that C<\c\> alone at the end of a regular expression (or doubled-quoted
	158	string) is not valid. The backslash must be followed by another character.
	159	That is, C<\c\I<X>> means C<chr(28) . 'I<X>'> for all characters I<X>.
	160
	161	To write platform-independent code, you must use C<\N{I<NAME>}> instead, like
	162	C<\N{ESCAPE}> or C<\N{U+001B}>, see L<charnames>.
	163
	164	Mnemonic: I<c>ontrol character.
	165
	166	=head4 Example
	167
	168	$str =~ /\cK/; # Matches if $str contains a vertical tab (control-K).
	169
	170	=head3 Named or numbered characters and character sequences
	171
	172	Unicode characters have a Unicode name and numeric code point (ordinal)
	173	value. Use the
	174	C<\N{}> construct to specify a character by either of these values.
	175	Certain sequences of characters also have names.
	176
	177	To specify by name, the name of the character or character sequence goes
	178	between the curly braces.
	179
	180	To specify a character by Unicode code point, use the form C<\N{U+I<code
	181	point>}>, where I<code point> is a number in hexadecimal that gives the
	182	code point that Unicode has assigned to the desired character. It is
	183	customary but not required to use leading zeros to pad the number to 4
	184	digits. Thus C<\N{U+0041}> means C<LATIN CAPITAL LETTER A>, and you will
	185	rarely see it written without the two leading zeros. C<\N{U+0041}> means
	186	"A" even on EBCDIC machines (where the ordinal value of "A" is not 0x41).
	187
	188	It is even possible to give your own names to characters and character
	189	sequences. For details, see L<charnames>.
	190
	191	(There is an expanded internal form that you may see in debug output:
	192	C<\N{U+I<code point>.I<code point>...}>.
	193	The C<...> means any number of these I<code point>s separated by dots.
	194	This represents the sequence formed by the characters. This is an internal
	195	form only, subject to change, and you should not try to use it yourself.)
	196
	197	Mnemonic: I<N>amed character.
	198
	199	Note that a character or character sequence expressed as a named
	200	or numbered character is considered a character without special
	201	meaning by the regex engine, and will match "as is".
	202
	203	=head4 Example
	204
	205	$str =~ /\N{THAI CHARACTER SO SO}/; # Matches the Thai SO SO character
	206
	207	use charnames 'Cyrillic'; # Loads Cyrillic names.
	208	$str =~ /\N{ZHE}\N{KA}/; # Match "ZHE" followed by "KA".
	209
	210	=head3 Octal escapes
	211
	212	There are two forms of octal escapes. Each is used to specify a character by
	213	its code point specified in octal notation.
	214
	215	One form, available starting in Perl 5.14 looks like C<\o{...}>, where the dots
	216	represent one or more octal digits. It can be used for any Unicode character.
	217
	218	It was introduced to avoid the potential problems with the other form,
	219	available in all Perls. That form consists of a backslash followed by three
	220	octal digits. One problem with this form is that it can look exactly like an
	221	old-style backreference (see
	222	L</Disambiguation rules between old-style octal escapes and backreferences>
	223	below.) You can avoid this by making the first of the three digits always a
	224	zero, but that makes \077 the largest code point specifiable.
	225
	226	In some contexts, a backslash followed by two or even one octal digits may be
	227	interpreted as an octal escape, sometimes with a warning, and because of some
	228	bugs, sometimes with surprising results. Also, if you are creating a regex
	229	out of smaller snippets concatenated together, and you use fewer than three
	230	digits, the beginning of one snippet may be interpreted as adding digits to the
	231	ending of the snippet before it. See L</Absolute referencing> for more
	232	discussion and examples of the snippet problem.
	233
	234	Note that a character expressed as an octal escape is considered
	235	a character without special meaning by the regex engine, and will match
	236	"as is".
	237
	238	To summarize, the C<\o{}> form is always safe to use, and the other form is
	239	safe to use for code points through \077 when you use exactly three digits to
	240	specify them.
	241
	242	Mnemonic: I<0>ctal or I<o>ctal.
	243
	244	=head4 Examples (assuming an ASCII platform)
	245
	246	$str = "Perl";
	247	$str =~ /\o{120}/; # Match, "\120" is "P".
	248	$str =~ /\120/; # Same.
	249	$str =~ /\o{120}+/; # Match, "\120" is "P", it's repeated at least once
	250	$str =~ /\120+/; # Same.
	251	$str =~ /P\053/; # No match, "\053" is "+" and taken literally.
	252	/\o{23073}/ # Black foreground, white background smiling face.
	253	/\o{4801234567}/ # Raises a warning, and yields chr(4)
	254
	255	=head4 Disambiguation rules between old-style octal escapes and backreferences
	256
	257	Octal escapes of the C<\000> form outside of bracketed character classes
	258	potentially clash with old-style backreferences. (see L</Absolute referencing>
	259	below). They both consist of a backslash followed by numbers. So Perl has to
	260	use heuristics to determine whether it is a backreference or an octal escape.
	261	Perl uses the following rules to disambiguate:
	262
	263	=over 4
	264
	265	=item 1
	266
	267	If the backslash is followed by a single digit, it's a backreference.
	268
	269	=item 2
	270
	271	If the first digit following the backslash is a 0, it's an octal escape.
	272
	273	=item 3
	274
	275	If the number following the backslash is N (in decimal), and Perl already
	276	has seen N capture groups, Perl considers this a backreference. Otherwise,
	277	it considers it an octal escape. If N has more than three digits, Perl
	278	takes only the first three for the octal escape; the rest are matched as is.
	279
	280	my $pat = "(" x 999;
	281	$pat .= "a";
	282	$pat .= ")" x 999;
	283	/^($pat)\1000$/; # Matches 'aa'; there are 1000 capture groups.
	284	/^$pat\1000$/; # Matches 'a@0'; there are 999 capture groups
	285	# and \1000 is seen as \100 (a '@') and a '0'
	286
	287	=back
	288
	289	You can force a backreference interpretation always by using the C<\g{...}>
	290	form. You can the force an octal interpretation always by using the C<\o{...}>
	291	form, or for numbers up through \077 (= 63 decimal), by using three digits,
	292	beginning with a "0".
	293
	294	=head3 Hexadecimal escapes
	295
	296	Like octal escapes, there are two forms of hexadecimal escapes, but both start
	297	with the same thing, C<\x>. This is followed by either exactly two hexadecimal
	298	digits forming a number, or a hexadecimal number of arbitrary length surrounded
	299	by curly braces. The hexadecimal number is the code point of the character you
	300	want to express.
	301
	302	Note that a character expressed as one of these escapes is considered a
	303	character without special meaning by the regex engine, and will match
	304	"as is".
	305
	306	Mnemonic: heI<x>adecimal.
	307
	308	=head4 Examples (assuming an ASCII platform)
	309
	310	$str = "Perl";
	311	$str =~ /\x50/; # Match, "\x50" is "P".
	312	$str =~ /\x50+/; # Match, "\x50" is "P", it is repeated at least once
	313	$str =~ /P\x2B/; # No match, "\x2B" is "+" and taken literally.
	314
	315	/\x{2603}\x{2602}/ # Snowman with an umbrella.
	316	# The Unicode character 2603 is a snowman,
	317	# the Unicode character 2602 is an umbrella.
	318	/\x{263B}/ # Black smiling face.
	319	/\x{263b}/ # Same, the hex digits A - F are case insensitive.
	320
	321	=head2 Modifiers
	322
	323	A number of backslash sequences have to do with changing the character,
	324	or characters following them. C<\l> will lowercase the character following
	325	it, while C<\u> will uppercase (or, more accurately, titlecase) the
	326	character following it. They provide functionality similar to the
	327	functions C<lcfirst> and C<ucfirst>.
	328
	329	To uppercase or lowercase several characters, one might want to use
	330	C<\L> or C<\U>, which will lowercase/uppercase all characters following
	331	them, until either the end of the pattern or the next occurrence of
	332	C<\E>, whichever comes first. They provide functionality similar to what
	333	the functions C<lc> and C<uc> provide.
	334
	335	C<\Q> is used to quote (disable) pattern metacharacters, up to the next
	336	C<\E> or the end of the pattern. C<\Q> adds a backslash to any character
	337	that could have special meaning to Perl. In the ASCII range, it quotes
	338	every character that isn't a letter, digit, or underscore. See
	339	L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
	340	code points. Using this ensures that any character between C<\Q> and
	341	C<\E> will be matched literally, not interpreted as a metacharacter by
	342	the regex engine.
	343
	344	C<\F> can be used to casefold all characters following, up to the next C<\E>
	345	or the end of the pattern. It provides the functionality similar to
	346	the C<fc> function.
	347
	348	Mnemonic: I<L>owercase, I<U>ppercase, I<F>old-case, I<Q>uotemeta, I<E>nd.
	349
	350	=head4 Examples
	351
	352	$sid = "sid";
	353	$greg = "GrEg";
	354	$miranda = "(Miranda)";
	355	$str =~ /\u$sid/; # Matches 'Sid'
	356	$str =~ /\L$greg/; # Matches 'greg'
	357	$str =~ /\Q$miranda\E/; # Matches '(Miranda)', as if the pattern
	358	# had been written as /$Miranda$/
	359
	360	=head2 Character classes
	361
	362	Perl regular expressions have a large range of character classes. Some of
	363	the character classes are written as a backslash sequence. We will briefly
	364	discuss those here; full details of character classes can be found in
	365	L<perlrecharclass>.
	366
	367	C<\w> is a character class that matches any single I<word> character
	368	(letters, digits, Unicode marks, and connector punctuation (like the
	369	underscore)). C<\d> is a character class that matches any decimal
	370	digit, while the character class C<\s> matches any whitespace character.
	371	New in perl 5.10.0 are the classes C<\h> and C<\v> which match horizontal
	372	and vertical whitespace characters.
	373
	374	The exact set of characters matched by C<\d>, C<\s>, and C<\w> varies
	375	depending on various pragma and regular expression modifiers. It is
	376	possible to restrict the match to the ASCII range by using the C</a>
	377	regular expression modifier. See L<perlrecharclass>.
	378
	379	The uppercase variants (C<\W>, C<\D>, C<\S>, C<\H>, and C<\V>) are
	380	character classes that match, respectively, any character that isn't a
	381	word character, digit, whitespace, horizontal whitespace, or vertical
	382	whitespace.
	383
	384	Mnemonics: I<w>ord, I<d>igit, I<s>pace, I<h>orizontal, I<v>ertical.
	385
	386	=head3 Unicode classes
	387
	388	C<\pP> (where C<P> is a single letter) and C<\p{Property}> are used to
	389	match a character that matches the given Unicode property; properties
	390	include things like "letter", or "thai character". Capitalizing the
	391	sequence to C<\PP> and C<\P{Property}> make the sequence match a character
	392	that doesn't match the given Unicode property. For more details, see
	393	L<perlrecharclass/Backslash sequences> and
	394	L<perlunicode/Unicode Character Properties>.
	395
	396	Mnemonic: I<p>roperty.
	397
	398	=head2 Referencing
	399
	400	If capturing parenthesis are used in a regular expression, we can refer
	401	to the part of the source string that was matched, and match exactly the
	402	same thing. There are three ways of referring to such I<backreference>:
	403	absolutely, relatively, and by name.
	404
	405	=for later add link to perlrecapture
	406
	407	=head3 Absolute referencing
	408
	409	Either C<\gI<N>> (starting in Perl 5.10.0), or C<\I<N>> (old-style) where I<N>
	410	is a positive (unsigned) decimal number of any length is an absolute reference
	411	to a capturing group.
	412
	413	I<N> refers to the Nth set of parentheses, so C<\gI<N>> refers to whatever has
	414	been matched by that set of parentheses. Thus C<\g1> refers to the first
	415	capture group in the regex.
	416
	417	The C<\gI<N>> form can be equivalently written as C<\g{I<N>}>
	418	which avoids ambiguity when building a regex by concatenating shorter
	419	strings. Otherwise if you had a regex C<qr/$a$b/>, and C<$a> contained
	420	C<"\g1">, and C<$b> contained C<"37">, you would get C</\g137/> which is
	421	probably not what you intended.
	422
	423	In the C<\I<N>> form, I<N> must not begin with a "0", and there must be at
	424	least I<N> capturing groups, or else I<N> is considered an octal escape
	425	(but something like C<\18> is the same as C<\0018>; that is, the octal escape
	426	C<"\001"> followed by a literal digit C<"8">).
	427
	428	Mnemonic: I<g>roup.
	429
	430	=head4 Examples
	431
	432	/(\w+) \g1/; # Finds a duplicated word, (e.g. "cat cat").
	433	/(\w+) \1/; # Same thing; written old-style
	434	/(.)(.)\g2\g1/; # Match a four letter palindrome (e.g. "ABBA").
	435
	436
	437	=head3 Relative referencing
	438
	439	C<\g-I<N>> (starting in Perl 5.10.0) is used for relative addressing. (It can
	440	be written as C<\g{-I<N>>.) It refers to the I<N>th group before the
	441	C<\g{-I<N>}>.
	442
	443	The big advantage of this form is that it makes it much easier to write
	444	patterns with references that can be interpolated in larger patterns,
	445	even if the larger pattern also contains capture groups.
	446
	447	=head4 Examples
	448
	449	/(A) # Group 1
	450	( # Group 2
	451	(B) # Group 3
	452	\g{-1} # Refers to group 3 (B)
	453	\g{-3} # Refers to group 1 (A)
	454	)
	455	/x; # Matches "ABBA".
	456
	457	my $qr = qr /(.)(.)\g{-2}\g{-1}/; # Matches 'abab', 'cdcd', etc.
	458	/$qr$qr/ # Matches 'ababcdcd'.
	459
	460	=head3 Named referencing
	461
	462	C<\g{I<name>}> (starting in Perl 5.10.0) can be used to back refer to a
	463	named capture group, dispensing completely with having to think about capture
	464	buffer positions.
	465
	466	To be compatible with .Net regular expressions, C<\g{name}> may also be
	467	written as C<\k{name}>, C<< \k<name> >> or C<\k'name'>.
	468
	469	To prevent any ambiguity, I<name> must not start with a digit nor contain a
	470	hyphen.
	471
	472	=head4 Examples
	473
	474	/(?<word>\w+) \g{word}/ # Finds duplicated word, (e.g. "cat cat")
	475	/(?<word>\w+) \k{word}/ # Same.
	476	/(?<word>\w+) \k<word>/ # Same.
	477	/(?<letter1>.)(?<letter2>.)\g{letter2}\g{letter1}/
	478	# Match a four letter palindrome (e.g. "ABBA")
	479
	480	=head2 Assertions
	481
	482	Assertions are conditions that have to be true; they don't actually
	483	match parts of the substring. There are six assertions that are written as
	484	backslash sequences.
	485
	486	=over 4
	487
	488	=item \A
	489
	490	C<\A> only matches at the beginning of the string. If the C</m> modifier
	491	isn't used, then C</\A/> is equivalent to C</^/>. However, if the C</m>
	492	modifier is used, then C</^/> matches internal newlines, but the meaning
	493	of C</\A/> isn't changed by the C</m> modifier. C<\A> matches at the beginning
	494	of the string regardless whether the C</m> modifier is used.
	495
	496	=item \z, \Z
	497
	498	C<\z> and C<\Z> match at the end of the string. If the C</m> modifier isn't
	499	used, then C</\Z/> is equivalent to C</$/>; that is, it matches at the
	500	end of the string, or one before the newline at the end of the string. If the
	501	C</m> modifier is used, then C</$/> matches at internal newlines, but the
	502	meaning of C</\Z/> isn't changed by the C</m> modifier. C<\Z> matches at
	503	the end of the string (or just before a trailing newline) regardless whether
	504	the C</m> modifier is used.
	505
	506	C<\z> is just like C<\Z>, except that it does not match before a trailing
	507	newline. C<\z> matches at the end of the string only, regardless of the
	508	modifiers used, and not just before a newline. It is how to anchor the
	509	match to the true end of the string under all conditions.
	510
	511	=item \G
	512
	513	C<\G> is usually used only in combination with the C</g> modifier. If the
	514	C</g> modifier is used and the match is done in scalar context, Perl
	515	remembers where in the source string the last match ended, and the next time,
	516	it will start the match from where it ended the previous time.
	517
	518	C<\G> matches the point where the previous match on that string ended,
	519	or the beginning of that string if there was no previous match.
	520
	521	=for later add link to perlremodifiers
	522
	523	Mnemonic: I<G>lobal.
	524
	525	=item \b, \B
	526
	527	C<\b> matches at any place between a word and a non-word character; C<\B>
	528	matches at any place between characters where C<\b> doesn't match. C<\b>
	529	and C<\B> assume there's a non-word character before the beginning and after
	530	the end of the source string; so C<\b> will match at the beginning (or end)
	531	of the source string if the source string begins (or ends) with a word
	532	character. Otherwise, C<\B> will match.
	533
	534	Do not use something like C<\b=head\d\b> and expect it to match the
	535	beginning of a line. It can't, because for there to be a boundary before
	536	the non-word "=", there must be a word character immediately previous.
	537	All boundary determinations look for word characters alone, not for
	538	non-words characters nor for string ends. It may help to understand how
	539	<\b> and <\B> work by equating them as follows:
	540
	541	\b really means (?:(?<=\w)(?!\w)\|(?<!\w)(?=\w))
	542	\B really means (?:(?<=\w)(?=\w)\|(?<!\w)(?!\w))
	543
	544	Mnemonic: I<b>oundary.
	545
	546	=back
	547
	548	=head4 Examples
	549
	550	"cat" =~ /\Acat/; # Match.
	551	"cat" =~ /cat\Z/; # Match.
	552	"cat\n" =~ /cat\Z/; # Match.
	553	"cat\n" =~ /cat\z/; # No match.
	554
	555	"cat" =~ /\bcat\b/; # Matches.
	556	"cats" =~ /\bcat\b/; # No match.
	557	"cat" =~ /\bcat\B/; # No match.
	558	"cats" =~ /\bcat\B/; # Match.
	559
	560	while ("cat dog" =~ /(\w+)/g) {
	561	print $1; # Prints 'catdog'
	562	}
	563	while ("cat dog" =~ /\G(\w+)/g) {
	564	print $1; # Prints 'cat'
	565	}
	566
	567	=head2 Misc
	568
	569	Here we document the backslash sequences that don't fall in one of the
	570	categories above. These are:
	571
	572	=over 4
	573
	574	=item \C
	575
	576	C<\C> always matches a single octet, even if the source string is encoded
	577	in UTF-8 format, and the character to be matched is a multi-octet character.
	578	This is very dangerous, because it violates
	579	the logical character abstraction and can cause UTF-8 sequences to become malformed.
	580
	581	Mnemonic: oI<C>tet.
	582
	583	=item \K
	584
	585	This appeared in perl 5.10.0. Anything matched left of C<\K> is
	586	not included in C<$&>, and will not be replaced if the pattern is
	587	used in a substitution. This lets you write C<s/PAT1 \K PAT2/REPL/x>
	588	instead of C<s/(PAT1) PAT2/${1}REPL/x> or C<s/(?<=PAT1) PAT2/REPL/x>.
	589
	590	Mnemonic: I<K>eep.
	591
	592	=item \N
	593
	594	This is an experimental feature new to perl 5.12.0. It matches any character
	595	that is B<not> a newline. It is a short-hand for writing C<[^\n]>, and is
	596	identical to the C<.> metasymbol, except under the C</s> flag, which changes
	597	the meaning of C<.>, but not C<\N>.
	598
	599	Note that C<\N{...}> can mean a
	600	L<named or numbered character
	601	\|/Named or numbered characters and character sequences>.
	602
	603	Mnemonic: Complement of I<\n>.
	604
	605	=item \R
	606	X<\R>
	607
	608	C<\R> matches a I<generic newline>; that is, anything considered a
	609	linebreak sequence by Unicode. This includes all characters matched by
	610	C<\v> (vertical whitespace), and the multi character sequence C<"\x0D\x0A">
	611	(carriage return followed by a line feed, sometimes called the network
	612	newline; it's the end of line sequence used in Microsoft text files opened
	613	in binary mode). C<\R> is equivalent to C<< (?>\x0D\x0A\|\v) >>. (The
	614	reason it doesn't backtrack is that the sequence is considered
	615	inseparable. That means that
	616
	617	"\x0D\x0A" =~ /^\R\x0A$/ # No match
	618
	619	fails, because the C<\R> matches the entire string, and won't backtrack
	620	to match just the C<"\x0D">.) Since
	621	C<\R> can match a sequence of more than one character, it cannot be put
	622	inside a bracketed character class; C</[\R]/> is an error; use C<\v>
	623	instead. C<\R> was introduced in perl 5.10.0.
	624
	625	Note that this does not respect any locale that might be in effect; it
	626	matches according to the platform's native character set.
	627
	628	Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>,
	629	and more importantly because Unicode recommends such a regular expression
	630	metacharacter, and suggests C<\R> as its notation.
	631
	632	=item \X
	633	X<\X>
	634
	635	This matches a Unicode I<extended grapheme cluster>.
	636
	637	C<\X> matches quite well what normal (non-Unicode-programmer) usage
	638	would consider a single character. As an example, consider a G with some sort
	639	of diacritic mark, such as an arrow. There is no such single character in
	640	Unicode, but one can be composed by using a G followed by a Unicode "COMBINING
	641	UPWARDS ARROW BELOW", and would be displayed by Unicode-aware software as if it
	642	were a single character.
	643
	644	Mnemonic: eI<X>tended Unicode character.
	645
	646	=back
	647
	648	=head4 Examples
	649
	650	"\x{256}" =~ /^\C\C$/; # Match as chr (0x256) takes 2 octets in UTF-8.
	651
	652	$str =~ s/foo\Kbar/baz/g; # Change any 'bar' following a 'foo' to 'baz'
	653	$str =~ s/(.)\K\g1//g; # Delete duplicated characters.
	654
	655	"\n" =~ /^\R$/; # Match, \n is a generic newline.
	656	"\r" =~ /^\R$/; # Match, \r is a generic newline.
	657	"\r\n" =~ /^\R$/; # Match, \r\n is a generic newline.
	658
	659	"P\x{307}" =~ /^\X$/ # \X matches a P with a dot above.
	660
	661	=cut