perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlfaq6 - Regexes ($Revision: 1.27 $, $Date: 1999/05/23 16:08:30 $)
	4
	5	=head1 DESCRIPTION
	6
	7	This section is surprisingly small because the rest of the FAQ is
	8	littered with answers involving regular expressions. For example,
	9	decoding a URL and checking whether something is a number are handled
	10	with regular expressions, but those answers are found elsewhere in
	11	this document (in the section on Data and the Networking one on
	12	networking, to be precise).
	13
	14	=head2 How can I hope to use regular expressions without creating illegible and unmaintainable code?
	15
	16	Three techniques can make regular expressions maintainable and
	17	understandable.
	18
	19	=over 4
	20
	21	=item Comments Outside the Regex
	22
	23	Describe what you're doing and how you're doing it, using normal Perl
	24	comments.
	25
	26	# turn the line into the first word, a colon, and the
	27	# number of characters on the rest of the line
	28	s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
	29
	30	=item Comments Inside the Regex
	31
	32	The C</x> modifier causes whitespace to be ignored in a regex pattern
	33	(except in a character class), and also allows you to use normal
	34	comments there, too. As you can imagine, whitespace and comments help
	35	a lot.
	36
	37	C</x> lets you turn this:
	38
	39	s{<(?:[^>'"]\|".?"\|'.*?')+>}{}gs;
	40
	41	into this:
	42
	43	s{ < # opening angle bracket
	44	(?: # Non-backreffing grouping paren
	45	[^>'"] * # 0 or more things that are neither > nor ' nor "
	46	\| # or else
	47	".*?" # a section between double quotes (stingy match)
	48	\| # or else
	49	'.*?' # a section between single quotes (stingy match)
	50	) + # all occurring one or more times
	51	> # closing angle bracket
	52	}{}gsx; # replace with nothing, i.e. delete
	53
	54	It's still not quite so clear as prose, but it is very useful for
	55	describing the meaning of each part of the pattern.
	56
	57	=item Different Delimiters
	58
	59	While we normally think of patterns as being delimited with C</>
	60	characters, they can be delimited by almost any character. L<perlre>
	61	describes this. For example, the C<s///> above uses braces as
	62	delimiters. Selecting another delimiter can avoid quoting the
	63	delimiter within the pattern:
	64
	65	s/\/usr\/local/\/usr\/share/g; # bad delimiter choice
	66	s#/usr/local#/usr/share#g; # better
	67
	68	=back
	69
	70	=head2 I'm having trouble matching over more than one line. What's wrong?
	71
	72	Either you don't have more than one line in the string you're looking at
	73	(probably), or else you aren't using the correct modifier(s) on your
	74	pattern (possibly).
	75
	76	There are many ways to get multiline data into a string. If you want
	77	it to happen automatically while reading input, you'll want to set $/
	78	(probably to '' for paragraphs or C<undef> for the whole file) to
	79	allow you to read more than one line at a time.
	80
	81	Read L<perlre> to help you decide which of C</s> and C</m> (or both)
	82	you might want to use: C</s> allows dot to include newline, and C</m>
	83	allows caret and dollar to match next to a newline, not just at the
	84	end of the string. You do need to make sure that you've actually
	85	got a multiline string in there.
	86
	87	For example, this program detects duplicate words, even when they span
	88	line breaks (but not paragraph ones). For this example, we don't need
	89	C</s> because we aren't using dot in a regular expression that we want
	90	to cross line boundaries. Neither do we need C</m> because we aren't
	91	wanting caret or dollar to match at any point inside the record next
	92	to newlines. But it's imperative that $/ be set to something other
	93	than the default, or else we won't actually ever have a multiline
	94	record read in.
	95
	96	$/ = ''; # read in more whole paragraph, not just one line
	97	while ( <> ) {
	98	while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha
	99	print "Duplicate $1 at paragraph $.\n";
	100	}
	101	}
	102
	103	Here's code that finds sentences that begin with "From " (which would
	104	be mangled by many mailers):
	105
	106	$/ = ''; # read in more whole paragraph, not just one line
	107	while ( <> ) {
	108	while ( /^From /gm ) { # /m makes ^ match next to \n
	109	print "leading from in paragraph $.\n";
	110	}
	111	}
	112
	113	Here's code that finds everything between START and END in a paragraph:
	114
	115	undef $/; # read in whole file, not just one line or paragraph
	116	while ( <> ) {
	117	while ( /START(.*?)END/sm ) { # /s makes . cross line boundaries
	118	print "$1\n";
	119	}
	120	}
	121
	122	=head2 How can I pull out lines between two patterns that are themselves on different lines?
	123
	124	You can use Perl's somewhat exotic C<..> operator (documented in
	125	L<perlop>):
	126
	127	perl -ne 'print if /START/ .. /END/' file1 file2 ...
	128
	129	If you wanted text and not lines, you would use
	130
	131	perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
	132
	133	But if you want nested occurrences of C<START> through C<END>, you'll
	134	run up against the problem described in the question in this section
	135	on matching balanced text.
	136
	137	Here's another example of using C<..>:
	138
	139	while (<>) {
	140	$in_header = 1 .. /^$/;
	141	$in_body = /^$/ .. eof();
	142	# now choose between them
	143	} continue {
	144	reset if eof(); # fix $.
	145	}
	146
	147	=head2 I put a regular expression into $/ but it didn't work. What's wrong?
	148
	149	$/ must be a string, not a regular expression. Awk has to be better
	150	for something. :-)
	151
	152	Actually, you could do this if you don't mind reading the whole file
	153	into memory:
	154
	155	undef $/;
	156	@records = split /your_pattern/, <FH>;
	157
	158	The Net::Telnet module (available from CPAN) has the capability to
	159	wait for a pattern in the input stream, or timeout if it doesn't
	160	appear within a certain time.
	161
	162	## Create a file with three lines.
	163	open FH, ">file";
	164	print FH "The first line\nThe second line\nThe third line\n";
	165	close FH;
	166
	167	## Get a read/write filehandle to it.
	168	$fh = new FileHandle "+<file";
	169
	170	## Attach it to a "stream" object.
	171	use Net::Telnet;
	172	$file = new Net::Telnet (-fhopen => $fh);
	173
	174	## Search for the second line and print out the third.
	175	$file->waitfor('/second line\n/');
	176	print $file->getline;
	177
	178	=head2 How do I substitute case insensitively on the LHS, but preserving case on the RHS?
	179
	180	Here's a lovely Perlish solution by Larry Rosler. It exploits
	181	properties of bitwise xor on ASCII strings.
	182
	183	$_= "this is a TEsT case";
	184
	185	$old = 'test';
	186	$new = 'success';
	187
	188	s{(\Q$old\E}
	189	{ uc $new \| (uc $1 ^ $1) .
	190	(uc(substr $1, -1) ^ substr $1, -1) x
	191	(length($new) - length $1)
	192	}egi;
	193
	194	print;
	195
	196	And here it is as a subroutine, modelled after the above:
	197
	198	sub preserve_case($$) {
	199	my ($old, $new) = @_;
	200	my $mask = uc $old ^ $old;
	201
	202	uc $new \| $mask .
	203	substr($mask, -1) x (length($new) - length($old))
	204	}
	205
	206	$a = "this is a TEsT case";
	207	$a =~ s/(test)/preserve_case($1, "success")/egi;
	208	print "$a\n";
	209
	210	This prints:
	211
	212	this is a SUcCESS case
	213
	214	Just to show that C programmers can write C in any programming language,
	215	if you prefer a more C-like solution, the following script makes the
	216	substitution have the same case, letter by letter, as the original.
	217	(It also happens to run about 240% slower than the Perlish solution runs.)
	218	If the substitution has more characters than the string being substituted,
	219	the case of the last character is used for the rest of the substitution.
	220
	221	# Original by Nathan Torkington, massaged by Jeffrey Friedl
	222	#
	223	sub preserve_case($$)
	224	{
	225	my ($old, $new) = @_;
	226	my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc
	227	my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
	228	my ($len) = $oldlen < $newlen ? $oldlen : $newlen;
	229
	230	for ($i = 0; $i < $len; $i++) {
	231	if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
	232	$state = 0;
	233	} elsif (lc $c eq $c) {
	234	substr($new, $i, 1) = lc(substr($new, $i, 1));
	235	$state = 1;
	236	} else {
	237	substr($new, $i, 1) = uc(substr($new, $i, 1));
	238	$state = 2;
	239	}
	240	}
	241	# finish up with any remaining new (for when new is longer than old)
	242	if ($newlen > $oldlen) {
	243	if ($state == 1) {
	244	substr($new, $oldlen) = lc(substr($new, $oldlen));
	245	} elsif ($state == 2) {
	246	substr($new, $oldlen) = uc(substr($new, $oldlen));
	247	}
	248	}
	249	return $new;
	250	}
	251
	252	=head2 How can I make C<\w> match national character sets?
	253
	254	See L<perllocale>.
	255
	256	=head2 How can I match a locale-smart version of C</[a-zA-Z]/>?
	257
	258	One alphabetic character would be C</[^\W\d_]/>, no matter what locale
	259	you're in. Non-alphabetics would be C</[\W\d_]/> (assuming you don't
	260	consider an underscore a letter).
	261
	262	=head2 How can I quote a variable to use in a regex?
	263
	264	The Perl parser will expand $variable and @variable references in
	265	regular expressions unless the delimiter is a single quote. Remember,
	266	too, that the right-hand side of a C<s///> substitution is considered
	267	a double-quoted string (see L<perlop> for more details). Remember
	268	also that any regex special characters will be acted on unless you
	269	precede the substitution with \Q. Here's an example:
	270
	271	$string = "to die?";
	272	$lhs = "die?";
	273	$rhs = "sleep, no more";
	274
	275	$string =~ s/\Q$lhs/$rhs/;
	276	# $string is now "to sleep no more"
	277
	278	Without the \Q, the regex would also spuriously match "di".
	279
	280	=head2 What is C</o> really for?
	281
	282	Using a variable in a regular expression match forces a re-evaluation
	283	(and perhaps recompilation) each time through. The C</o> modifier
	284	locks in the regex the first time it's used. This always happens in a
	285	constant regular expression, and in fact, the pattern was compiled
	286	into the internal format at the same time your entire program was.
	287
	288	Use of C</o> is irrelevant unless variable interpolation is used in
	289	the pattern, and if so, the regex engine will neither know nor care
	290	whether the variables change after the pattern is evaluated the I<very
	291	first> time.
	292
	293	C</o> is often used to gain an extra measure of efficiency by not
	294	performing subsequent evaluations when you know it won't matter
	295	(because you know the variables won't change), or more rarely, when
	296	you don't want the regex to notice if they do.
	297
	298	For example, here's a "paragrep" program:
	299
	300	$/ = ''; # paragraph mode
	301	$pat = shift;
	302	while (<>) {
	303	print if /$pat/o;
	304	}
	305
	306	=head2 How do I use a regular expression to strip C style comments from a file?
	307
	308	While this actually can be done, it's much harder than you'd think.
	309	For example, this one-liner
	310
	311	perl -0777 -pe 's{/\.?\*/}{}gs' foo.c
	312
	313	will work in many but not all cases. You see, it's too simple-minded for
	314	certain kinds of C programs, in particular, those with what appear to be
	315	comments in quoted strings. For that, you'd need something like this,
	316	created by Jeffrey Friedl and later modified by Fred Curtis.
	317
	318	$/ = undef;
	319	$_ = <>;
	320	s#/\[^]\+([^/][^]\+)/\|("(\\.\|[^"\\])"\|'(\\.\|[^'\\])'\|.[^/"'\\])#$2#gs
	321	print;
	322
	323	This could, of course, be more legibly written with the C</x> modifier, adding
	324	whitespace and comments. Here it is expanded, courtesy of Fred Curtis.
	325
	326	s{
	327	/\* ## Start of /* ... */ comment
	328	[^]\+ ## Non- followed by 1-or-more *'s
	329	(
	330	[^/][^]\+
	331	)* ## 0-or-more things which don't start with /
	332	## but do end with '*'
	333	/ ## End of /* ... */ comment
	334
	335	\| ## OR various things which aren't comments:
	336
	337	(
	338	" ## Start of " ... " string
	339	(
	340	\\. ## Escaped char
	341	\| ## OR
	342	[^"\\] ## Non "\
	343	)*
	344	" ## End of " ... " string
	345
	346	\| ## OR
	347
	348	' ## Start of ' ... ' string
	349	(
	350	\\. ## Escaped char
	351	\| ## OR
	352	[^'\\] ## Non '\
	353	)*
	354	' ## End of ' ... ' string
	355
	356	\| ## OR
	357
	358	. ## Anything other char
	359	[^/"'\\]* ## Chars which doesn't start a comment, string or escape
	360	)
	361	}{$2}gxs;
	362
	363	A slight modification also removes C++ comments:
	364
	365	s#/\[^]\+([^/][^]\+)/\|//[^\n]\|("(\\.\|[^"\\])"\|'(\\.\|[^'\\])'\|.[^/"'\\]*)#$2#gs;
	366
	367	=head2 Can I use Perl regular expressions to match balanced text?
	368
	369	Although Perl regular expressions are more powerful than "mathematical"
	370	regular expressions, because they feature conveniences like backreferences
	371	(C<\1> and its ilk), they still aren't powerful enough -- with
	372	the possible exception of bizarre and experimental features in the
	373	development-track releases of Perl. You still need to use non-regex
	374	techniques to parse balanced text, such as the text enclosed between
	375	matching parentheses or braces, for example.
	376
	377	An elaborate subroutine (for 7-bit ASCII only) to pull out balanced
	378	and possibly nested single chars, like C<`> and C<'>, C<{> and C<}>,
	379	or C<(> and C<)> can be found in
	380	http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz .
	381
	382	The C::Scan module from CPAN contains such subs for internal usage,
	383	but they are undocumented.
	384
	385	=head2 What does it mean that regexes are greedy? How can I get around it?
	386
	387	Most people mean that greedy regexes match as much as they can.
	388	Technically speaking, it's actually the quantifiers (C<?>, C<*>, C<+>,
	389	C<{}>) that are greedy rather than the whole pattern; Perl prefers local
	390	greed and immediate gratification to overall greed. To get non-greedy
	391	versions of the same quantifiers, use (C<??>, C<*?>, C<+?>, C<{}?>).
	392
	393	An example:
	394
	395	$s1 = $s2 = "I am very very cold";
	396	$s1 =~ s/ve.*y //; # I am cold
	397	$s2 =~ s/ve.*?y //; # I am very cold
	398
	399	Notice how the second substitution stopped matching as soon as it
	400	encountered "y ". The C<*?> quantifier effectively tells the regular
	401	expression engine to find a match as quickly as possible and pass
	402	control on to whatever is next in line, like you would if you were
	403	playing hot potato.
	404
	405	=head2 How do I process each word on each line?
	406
	407	Use the split function:
	408
	409	while (<>) {
	410	foreach $word ( split ) {
	411	# do something with $word here
	412	}
	413	}
	414
	415	Note that this isn't really a word in the English sense; it's just
	416	chunks of consecutive non-whitespace characters.
	417
	418	To work with only alphanumeric sequences, you might consider
	419
	420	while (<>) {
	421	foreach $word (m/(\w+)/g) {
	422	# do something with $word here
	423	}
	424	}
	425
	426	=head2 How can I print out a word-frequency or line-frequency summary?
	427
	428	To do this, you have to parse out each word in the input stream. We'll
	429	pretend that by word you mean chunk of alphabetics, hyphens, or
	430	apostrophes, rather than the non-whitespace chunk idea of a word given
	431	in the previous question:
	432
	433	while (<>) {
	434	while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
	435	$seen{$1}++;
	436	}
	437	}
	438	while ( ($word, $count) = each %seen ) {
	439	print "$count $word\n";
	440	}
	441
	442	If you wanted to do the same thing for lines, you wouldn't need a
	443	regular expression:
	444
	445	while (<>) {
	446	$seen{$_}++;
	447	}
	448	while ( ($line, $count) = each %seen ) {
	449	print "$count $line";
	450	}
	451
	452	If you want these output in a sorted order, see the section on Hashes.
	453
	454	=head2 How can I do approximate matching?
	455
	456	See the module String::Approx available from CPAN.
	457
	458	=head2 How do I efficiently match many regular expressions at once?
	459
	460	The following is extremely inefficient:
	461
	462	# slow but obvious way
	463	@popstates = qw(CO ON MI WI MN);
	464	while (defined($line = <>)) {
	465	for $state (@popstates) {
	466	if ($line =~ /\b$state\b/i) {
	467	print $line;
	468	last;
	469	}
	470	}
	471	}
	472
	473	That's because Perl has to recompile all those patterns for each of
	474	the lines of the file. As of the 5.005 release, there's a much better
	475	approach, one which makes use of the new C<qr//> operator:
	476
	477	# use spiffy new qr// operator, with /i flag even
	478	use 5.005;
	479	@popstates = qw(CO ON MI WI MN);
	480	@poppats = map { qr/\b$_\b/i } @popstates;
	481	while (defined($line = <>)) {
	482	for $patobj (@poppats) {
	483	print $line if $line =~ /$patobj/;
	484	}
	485	}
	486
	487	=head2 Why don't word-boundary searches with C<\b> work for me?
	488
	489	Two common misconceptions are that C<\b> is a synonym for C<\s+>, and
	490	that it's the edge between whitespace characters and non-whitespace
	491	characters. Neither is correct. C<\b> is the place between a C<\w>
	492	character and a C<\W> character (that is, C<\b> is the edge of a
	493	"word"). It's a zero-width assertion, just like C<^>, C<$>, and all
	494	the other anchors, so it doesn't consume any characters. L<perlre>
	495	describes the behavior of all the regex metacharacters.
	496
	497	Here are examples of the incorrect application of C<\b>, with fixes:
	498
	499	"two words" =~ /(\w+)\b(\w+)/; # WRONG
	500	"two words" =~ /(\w+)\s+(\w+)/; # right
	501
	502	" =matchless= text" =~ /\b=(\w+)=\b/; # WRONG
	503	" =matchless= text" =~ /=(\w+)=/; # right
	504
	505	Although they may not do what you thought they did, C<\b> and C<\B>
	506	can still be quite useful. For an example of the correct use of
	507	C<\b>, see the example of matching duplicate words over multiple
	508	lines.
	509
	510	An example of using C<\B> is the pattern C<\Bis\B>. This will find
	511	occurrences of "is" on the insides of words only, as in "thistle", but
	512	not "this" or "island".
	513
	514	=head2 Why does using $&, $`, or $' slow my program down?
	515
	516	Because once Perl sees that you need one of these variables anywhere in
	517	the program, it has to provide them on each and every pattern match.
	518	The same mechanism that handles these provides for the use of $1, $2,
	519	etc., so you pay the same price for each regex that contains capturing
	520	parentheses. But if you never use $&, etc., in your script, then regexes
	521	I<without> capturing parentheses won't be penalized. So avoid $&, $',
	522	and $` if you can, but if you can't, once you've used them at all, use
	523	them at will because you've already paid the price. Remember that some
	524	algorithms really appreciate them. As of the 5.005 release. the $&
	525	variable is no longer "expensive" the way the other two are.
	526
	527	=head2 What good is C<\G> in a regular expression?
	528
	529	The notation C<\G> is used in a match or substitution in conjunction the
	530	C</g> modifier (and ignored if there's no C</g>) to anchor the regular
	531	expression to the point just past where the last match occurred, i.e. the
	532	pos() point. A failed match resets the position of C<\G> unless the
	533	C</c> modifier is in effect.
	534
	535	For example, suppose you had a line of text quoted in standard mail
	536	and Usenet notation, (that is, with leading C<E<gt>> characters), and
	537	you want change each leading C<E<gt>> into a corresponding C<:>. You
	538	could do so in this way:
	539
	540	s/^(>+)/':' x length($1)/gem;
	541
	542	Or, using C<\G>, the much simpler (and faster):
	543
	544	s/\G>/:/g;
	545
	546	A more sophisticated use might involve a tokenizer. The following
	547	lex-like example is courtesy of Jeffrey Friedl. It did not work in
	548	5.003 due to bugs in that release, but does work in 5.004 or better.
	549	(Note the use of C</c>, which prevents a failed match with C</g> from
	550	resetting the search position back to the beginning of the string.)
	551
	552	while (<>) {
	553	chomp;
	554	PARSER: {
	555	m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
	556	m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
	557	m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
	558	m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
	559	}
	560	}
	561
	562	Of course, that could have been written as
	563
	564	while (<>) {
	565	chomp;
	566	PARSER: {
	567	if ( /\G( \d+\b )/gcx {
	568	print "number: $1\n";
	569	redo PARSER;
	570	}
	571	if ( /\G( \w+ )/gcx {
	572	print "word: $1\n";
	573	redo PARSER;
	574	}
	575	if ( /\G( \s+ )/gcx {
	576	print "space: $1\n";
	577	redo PARSER;
	578	}
	579	if ( /\G( [^\w\d]+ )/gcx {
	580	print "other: $1\n";
	581	redo PARSER;
	582	}
	583	}
	584	}
	585
	586	But then you lose the vertical alignment of the regular expressions.
	587
	588	=head2 Are Perl regexes DFAs or NFAs? Are they POSIX compliant?
	589
	590	While it's true that Perl's regular expressions resemble the DFAs
	591	(deterministic finite automata) of the egrep(1) program, they are in
	592	fact implemented as NFAs (non-deterministic finite automata) to allow
	593	backtracking and backreferencing. And they aren't POSIX-style either,
	594	because those guarantee worst-case behavior for all cases. (It seems
	595	that some people prefer guarantees of consistency, even when what's
	596	guaranteed is slowness.) See the book "Mastering Regular Expressions"
	597	(from O'Reilly) by Jeffrey Friedl for all the details you could ever
	598	hope to know on these matters (a full citation appears in
	599	L<perlfaq2>).
	600
	601	=head2 What's wrong with using grep or map in a void context?
	602
	603	Both grep and map build a return list, regardless of their context.
	604	This means you're making Perl go to the trouble of building up a
	605	return list that you then just ignore. That's no way to treat a
	606	programming language, you insensitive scoundrel!
	607
	608	=head2 How can I match strings with multibyte characters?
	609
	610	This is hard, and there's no good way. Perl does not directly support
	611	wide characters. It pretends that a byte and a character are
	612	synonymous. The following set of approaches was offered by Jeffrey
	613	Friedl, whose article in issue #5 of The Perl Journal talks about this
	614	very matter.
	615
	616	Let's suppose you have some weird Martian encoding where pairs of
	617	ASCII uppercase letters encode single Martian letters (i.e. the two
	618	bytes "CV" make a single Martian letter, as do the two bytes "SG",
	619	"VS", "XX", etc.). Other bytes represent single characters, just like
	620	ASCII.
	621
	622	So, the string of Martian "I am CVSGXX!" uses 12 bytes to encode the
	623	nine characters 'I', ' ', 'a', 'm', ' ', 'CV', 'SG', 'XX', '!'.
	624
	625	Now, say you want to search for the single character C</GX/>. Perl
	626	doesn't know about Martian, so it'll find the two bytes "GX" in the "I
	627	am CVSGXX!" string, even though that character isn't there: it just
	628	looks like it is because "SG" is next to "XX", but there's no real
	629	"GX". This is a big problem.
	630
	631	Here are a few ways, all painful, to deal with it:
	632
	633	$martian =~ s/([A-Z][A-Z])/ $1 /g; # Make sure adjacent ``martian'' bytes
	634	# are no longer adjacent.
	635	print "found GX!\n" if $martian =~ /GX/;
	636
	637	Or like this:
	638
	639	@chars = $martian =~ m/([A-Z][A-Z]\|[^A-Z])/g;
	640	# above is conceptually similar to: @chars = $text =~ m/(.)/g;
	641	#
	642	foreach $char (@chars) {
	643	print "found GX!\n", last if $char eq 'GX';
	644	}
	645
	646	Or like this:
	647
	648	while ($martian =~ m/\G([A-Z][A-Z]\|.)/gs) { # \G probably unneeded
	649	print "found GX!\n", last if $1 eq 'GX';
	650	}
	651
	652	Or like this:
	653
	654	die "sorry, Perl doesn't (yet) have Martian support )-:\n";
	655
	656	There are many double- (and multi-) byte encodings commonly used these
	657	days. Some versions of these have 1-, 2-, 3-, and 4-byte characters,
	658	all mixed.
	659
	660	=head2 How do I match a pattern that is supplied by the user?
	661
	662	Well, if it's really a pattern, then just use
	663
	664	chomp($pattern = <STDIN>);
	665	if ($line =~ /$pattern/) { }
	666
	667	Or, since you have no guarantee that your user entered
	668	a valid regular expression, trap the exception this way:
	669
	670	if (eval { $line =~ /$pattern/ }) { }
	671
	672	But if all you really want to search for a string, not a pattern,
	673	then you should either use the index() function, which is made for
	674	string searching, or if you can't be disabused of using a pattern
	675	match on a non-pattern, then be sure to use C<\Q>...C<\E>, documented
	676	in L<perlre>.
	677
	678	$pattern = <STDIN>;
	679
	680	open (FILE, $input) or die "Couldn't open input $input: $!; aborting";
	681	while (<FILE>) {
	682	print if /\Q$pattern\E/;
	683	}
	684	close FILE;
	685
	686	=head1 AUTHOR AND COPYRIGHT
	687
	688	Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
	689	All rights reserved.
	690
	691	When included as part of the Standard Version of Perl, or as part of
	692	its complete documentation whether printed or otherwise, this work
	693	may be distributed only under the terms of Perl's Artistic License.
	694	Any distribution of this file or derivatives thereof I<outside>
	695	of that package require that special arrangements be made with
	696	copyright holder.
	697
	698	Irrespective of its distribution, all code examples in this file
	699	are hereby placed into the public domain. You are permitted and
	700	encouraged to use this code in your own programs for fun
	701	or for profit as you see fit. A simple comment in the code giving
	702	credit would be courteous but is not required.