Commit | Line | Data |
---|---|---|
2304df62 AD |
1 | Article 20075 of comp.lang.perl: |
2 | Newsgroups: comp.lang.perl | |
3 | Path: netlabs!news.cerf.net!ihnp4.ucsd.edu!swrinde!sgiblab!rpal.rockwell.com!imagen!pomeranz | |
4 | From: pomeranz@imagen.com (Hal Pomeranz) | |
5 | Subject: quotewords.pl [REVISED] | |
6 | Message-ID: <1994Mar23.071634.23171@aqm.com> | |
7 | Sender: usenet@aqm.com | |
8 | Nntp-Posting-Host: imagen | |
9 | Organization: QMS Inc., Santa Clara | |
10 | Date: Wed, 23 Mar 1994 07:16:34 GMT | |
11 | Lines: 132 | |
12 | ||
13 | ||
14 | ARRGH! The version I posted earlier tonight contained an error, so | |
15 | I've sent out a cancel to chase it down and kill it. Please use this | |
16 | version dated "23 March 1994". | |
17 | ||
18 | quotewords.pl is a generic replacement for shellwords.pl. | |
19 | "ewords() allows you to specify a delimiter, which may be a | |
20 | regular expression, and returns a list of words broken on that | |
21 | delimiter ignoring any instances of the delimiter which may appear | |
22 | within a quoted string. There's a boolean flag to tell the function | |
23 | whether or not you want it to strip quotes and backslashes or retain | |
24 | them. | |
25 | ||
26 | I've also included a revised version of &shellwords() (written in | |
27 | terms of "ewords() of course) which is 99% the same as the | |
28 | original version. The only difference is that the new version will | |
29 | not default to using $_ if no arguments are supplied. | |
30 | ||
31 | Share and enjoy... | |
32 | ||
33 | ============================================================================== | |
34 | Hal Pomeranz pomeranz@sclara.qms.com pomeranz@cs.swarthmore.edu | |
35 | System/Network Manager "All I can say is that my life is pretty plain. | |
36 | QMS Santa Clara I like watchin' the puddles gather rain." Blind Melon | |
37 | ============================================================================== | |
38 | ||
39 | # quotewords.pl | |
40 | # | |
41 | # Usage: | |
42 | # require 'quotes.pl'; | |
43 | # @words = "ewords($delim, $keep, @lines); | |
44 | # @words = &shellwords(@lines); | |
45 | ||
46 | # Hal Pomeranz (pomeranz@netcom.com), 23 March 1994 | |
47 | # Permission to use and distribute under the same terms as Perl. | |
48 | # No warranty expressed or implied. | |
49 | ||
50 | # Basically an update and generalization of the old shellwords.pl. | |
51 | # Much code shamelessly stolen from the old version (author unknown). | |
52 | # | |
53 | # "ewords() accepts a delimiter (which can be a regular expression) | |
54 | # and a list of lines and then breaks those lines up into a list of | |
55 | # words ignoring delimiters that appear inside quotes. | |
56 | # | |
57 | # The $keep argument is a boolean flag. If true, the quotes are kept | |
58 | # with each word, otherwise quotes are stripped in the splitting process. | |
59 | # $keep also defines whether unprotected backslashes are retained. | |
60 | # | |
61 | # A &shellwords() replacement is included to demonstrate the new package. | |
62 | # This version differs from the original in that it will _NOT_ default | |
63 | # to using $_ if no arguments are given. I personally find the old behavior | |
64 | # to be a mis-feature. | |
65 | ||
66 | package quotewords; | |
67 | ||
68 | sub main'shellwords { | |
69 | local(@lines) = @_; | |
70 | $lines[$#lines] =~ s/\s+$//; | |
71 | &main'quotewords('\s+', 0, @lines); | |
72 | } | |
73 | ||
74 | ||
75 | # "ewords() works by simply jamming all of @lines into a single | |
76 | # string in $_ and then pulling off words a bit at a time until $_ | |
77 | # is exhausted. | |
78 | # | |
79 | # The inner "for" loop builds up each word (or $field) one $snippet | |
80 | # at a time. A $snippet is a quoted string, a backslashed character, | |
81 | # or an unquoted string. We fall out of the "for" loop when we reach | |
82 | # the end of $_ or when we hit a delimiter. Falling out of the "for" | |
83 | # loop, we push the $field we've been building up onto the list of | |
84 | # @words we'll be returning, and then loop back and pull another word | |
85 | # off of $_. | |
86 | # | |
87 | # The first two cases inside the "for" loop deal with quoted strings. | |
88 | # The first case matches a double quoted string, removes it from $_, | |
89 | # and assigns the double quoted string to $snippet in the body of the | |
90 | # conditional. The second case handles single quoted strings. In | |
91 | # the third case we've found a quote at the current beginning of $_, | |
92 | # but it didn't match the quoted string regexps in the first two cases, | |
93 | # so it must be an unbalanced quote and we die with an error (which can | |
94 | # be caught by eval()). | |
95 | # | |
96 | # The next case handles backslashed characters, and the next case is the | |
97 | # exit case on reaching the end of the string or finding a delimiter. | |
98 | # | |
99 | # Otherwise, we've found an unquoted thing and we pull of characters one | |
100 | # at a time until we reach something that could start another $snippet-- | |
101 | # a quote of some sort, a backslash, or the delimiter. This one character | |
102 | # at a time behavior was necessary if the delimiter was going to be a | |
103 | # regexp (love to hear it if you can figure out a better way). | |
104 | ||
105 | sub main'quotewords { | |
106 | local($delim, $keep, @lines) = @_; | |
107 | local(@words,$snippet,$field,$_); | |
108 | ||
109 | $_ = join('', @lines); | |
110 | while ($_) { | |
111 | $field = ''; | |
112 | for (;;) { | |
113 | $snippet = ''; | |
114 | if (s/^"(([^"\\]|\\[\\"])*)"//) { | |
115 | $snippet = $1; | |
116 | $snippet = "\"$snippet\"" if ($keep); | |
117 | } | |
118 | elsif (s/^'(([^'\\]|\\[\\'])*)'//) { | |
119 | $snippet = $1; | |
120 | $snippet = "'$snippet'" if ($keep); | |
121 | } | |
122 | elsif (/^["']/) { | |
123 | die "Unmatched quote\n"; | |
124 | } | |
125 | elsif (s/^\\(.)//) { | |
126 | $snippet = $1; | |
127 | $snippet = "\\$snippet" if ($keep); | |
128 | } | |
129 | elsif (!$_ || s/^$delim//) { | |
130 | last; | |
131 | } | |
132 | else { | |
133 | while ($_ && !(/^$delim/ || /^['"\\]/)) { | |
134 | $snippet .= substr($_, 0, 1); | |
135 | substr($_, 0, 1) = ''; | |
136 | } | |
137 | } | |
138 | $field .= $snippet; | |
139 | } | |
140 | push(@words, $field); | |
141 | } | |
142 | @words; | |
143 | } | |
144 | 1; | |
145 | ||
146 |