This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
perl 5.0 alpha 8
[perl5.git] / lib / quotewords.pl.art
CommitLineData
2304df62
AD
1Article 20075 of comp.lang.perl:
2Newsgroups: comp.lang.perl
3Path: netlabs!news.cerf.net!ihnp4.ucsd.edu!swrinde!sgiblab!rpal.rockwell.com!imagen!pomeranz
4From: pomeranz@imagen.com (Hal Pomeranz)
5Subject: quotewords.pl [REVISED]
6Message-ID: <1994Mar23.071634.23171@aqm.com>
7Sender: usenet@aqm.com
8Nntp-Posting-Host: imagen
9Organization: QMS Inc., Santa Clara
10Date: Wed, 23 Mar 1994 07:16:34 GMT
11Lines: 132
12
13
14ARRGH! The version I posted earlier tonight contained an error, so
15I've sent out a cancel to chase it down and kill it. Please use this
16version dated "23 March 1994".
17
18quotewords.pl is a generic replacement for shellwords.pl.
19&quotewords() allows you to specify a delimiter, which may be a
20regular expression, and returns a list of words broken on that
21delimiter ignoring any instances of the delimiter which may appear
22within a quoted string. There's a boolean flag to tell the function
23whether or not you want it to strip quotes and backslashes or retain
24them.
25
26I've also included a revised version of &shellwords() (written in
27terms of &quotewords() of course) which is 99% the same as the
28original version. The only difference is that the new version will
29not default to using $_ if no arguments are supplied.
30
31Share and enjoy...
32
33==============================================================================
34 Hal Pomeranz pomeranz@sclara.qms.com pomeranz@cs.swarthmore.edu
35System/Network Manager "All I can say is that my life is pretty plain.
36 QMS Santa Clara I like watchin' the puddles gather rain." Blind Melon
37==============================================================================
38
39# quotewords.pl
40#
41# Usage:
42# require 'quotes.pl';
43# @words = &quotewords($delim, $keep, @lines);
44# @words = &shellwords(@lines);
45
46# Hal Pomeranz (pomeranz@netcom.com), 23 March 1994
47# Permission to use and distribute under the same terms as Perl.
48# No warranty expressed or implied.
49
50# Basically an update and generalization of the old shellwords.pl.
51# Much code shamelessly stolen from the old version (author unknown).
52#
53# &quotewords() accepts a delimiter (which can be a regular expression)
54# and a list of lines and then breaks those lines up into a list of
55# words ignoring delimiters that appear inside quotes.
56#
57# The $keep argument is a boolean flag. If true, the quotes are kept
58# with each word, otherwise quotes are stripped in the splitting process.
59# $keep also defines whether unprotected backslashes are retained.
60#
61# A &shellwords() replacement is included to demonstrate the new package.
62# This version differs from the original in that it will _NOT_ default
63# to using $_ if no arguments are given. I personally find the old behavior
64# to be a mis-feature.
65
66package quotewords;
67
68sub main'shellwords {
69 local(@lines) = @_;
70 $lines[$#lines] =~ s/\s+$//;
71 &main'quotewords('\s+', 0, @lines);
72}
73
74
75# &quotewords() works by simply jamming all of @lines into a single
76# string in $_ and then pulling off words a bit at a time until $_
77# is exhausted.
78#
79# The inner "for" loop builds up each word (or $field) one $snippet
80# at a time. A $snippet is a quoted string, a backslashed character,
81# or an unquoted string. We fall out of the "for" loop when we reach
82# the end of $_ or when we hit a delimiter. Falling out of the "for"
83# loop, we push the $field we've been building up onto the list of
84# @words we'll be returning, and then loop back and pull another word
85# off of $_.
86#
87# The first two cases inside the "for" loop deal with quoted strings.
88# The first case matches a double quoted string, removes it from $_,
89# and assigns the double quoted string to $snippet in the body of the
90# conditional. The second case handles single quoted strings. In
91# the third case we've found a quote at the current beginning of $_,
92# but it didn't match the quoted string regexps in the first two cases,
93# so it must be an unbalanced quote and we die with an error (which can
94# be caught by eval()).
95#
96# The next case handles backslashed characters, and the next case is the
97# exit case on reaching the end of the string or finding a delimiter.
98#
99# Otherwise, we've found an unquoted thing and we pull of characters one
100# at a time until we reach something that could start another $snippet--
101# a quote of some sort, a backslash, or the delimiter. This one character
102# at a time behavior was necessary if the delimiter was going to be a
103# regexp (love to hear it if you can figure out a better way).
104
105sub main'quotewords {
106 local($delim, $keep, @lines) = @_;
107 local(@words,$snippet,$field,$_);
108
109 $_ = join('', @lines);
110 while ($_) {
111 $field = '';
112 for (;;) {
113 $snippet = '';
114 if (s/^"(([^"\\]|\\[\\"])*)"//) {
115 $snippet = $1;
116 $snippet = "\"$snippet\"" if ($keep);
117 }
118 elsif (s/^'(([^'\\]|\\[\\'])*)'//) {
119 $snippet = $1;
120 $snippet = "'$snippet'" if ($keep);
121 }
122 elsif (/^["']/) {
123 die "Unmatched quote\n";
124 }
125 elsif (s/^\\(.)//) {
126 $snippet = $1;
127 $snippet = "\\$snippet" if ($keep);
128 }
129 elsif (!$_ || s/^$delim//) {
130 last;
131 }
132 else {
133 while ($_ && !(/^$delim/ || /^['"\\]/)) {
134 $snippet .= substr($_, 0, 1);
135 substr($_, 0, 1) = '';
136 }
137 }
138 $field .= $snippet;
139 }
140 push(@words, $field);
141 }
142 @words;
143}
1441;
145
146