Commit | Line | Data |
---|---|---|
c7c04614 GS |
1 | =head1 NAME |
2 | ||
3 | perlfilter - Source Filters | |
c47ff5f1 | 4 | |
c7c04614 GS |
5 | =head1 DESCRIPTION |
6 | ||
7 | This article is about a little-known feature of Perl called | |
8 | I<source filters>. Source filters alter the program text of a module | |
9 | before Perl sees it, much as a C preprocessor alters the source text of | |
10 | a C program before the compiler sees it. This article tells you more | |
11 | about what source filters are, how they work, and how to write your | |
12 | own. | |
13 | ||
14 | The original purpose of source filters was to let you encrypt your | |
15 | program source to prevent casual piracy. This isn't all they can do, as | |
16 | you'll soon learn. But first, the basics. | |
17 | ||
18 | =head1 CONCEPTS | |
19 | ||
20 | Before the Perl interpreter can execute a Perl script, it must first | |
4449c45d GS |
21 | read it from a file into memory for parsing and compilation. If that |
22 | script itself includes other scripts with a C<use> or C<require> | |
23 | statement, then each of those scripts will have to be read from their | |
24 | respective files as well. | |
c7c04614 GS |
25 | |
26 | Now think of each logical connection between the Perl parser and an | |
27 | individual file as a I<source stream>. A source stream is created when | |
28 | the Perl parser opens a file, it continues to exist as the source code | |
29 | is read into memory, and it is destroyed when Perl is finished parsing | |
30 | the file. If the parser encounters a C<require> or C<use> statement in | |
31 | a source stream, a new and distinct stream is created just for that | |
32 | file. | |
33 | ||
34 | The diagram below represents a single source stream, with the flow of | |
35 | source from a Perl script file on the left into the Perl parser on the | |
36 | right. This is how Perl normally operates. | |
37 | ||
38 | file -------> parser | |
39 | ||
40 | There are two important points to remember: | |
41 | ||
42 | =over 5 | |
43 | ||
44 | =item 1. | |
45 | ||
46 | Although there can be any number of source streams in existence at any | |
47 | given time, only one will be active. | |
48 | ||
49 | =item 2. | |
50 | ||
51 | Every source stream is associated with only one file. | |
52 | ||
53 | =back | |
54 | ||
55 | A source filter is a special kind of Perl module that intercepts and | |
56 | modifies a source stream before it reaches the parser. A source filter | |
40b7eeef | 57 | changes our diagram like this: |
c7c04614 GS |
58 | |
59 | file ----> filter ----> parser | |
60 | ||
61 | If that doesn't make much sense, consider the analogy of a command | |
62 | pipeline. Say you have a shell script stored in the compressed file | |
63 | I<trial.gz>. The simple pipeline command below runs the script without | |
64 | needing to create a temporary file to hold the uncompressed file. | |
65 | ||
66 | gunzip -c trial.gz | sh | |
67 | ||
68 | In this case, the data flow from the pipeline can be represented as follows: | |
69 | ||
70 | trial.gz ----> gunzip ----> sh | |
71 | ||
72 | With source filters, you can store the text of your script compressed and use a source filter to uncompress it for Perl's parser: | |
73 | ||
74 | compressed gunzip | |
75 | Perl program ---> source filter ---> parser | |
76 | ||
77 | =head1 USING FILTERS | |
78 | ||
79 | So how do you use a source filter in a Perl script? Above, I said that | |
80 | a source filter is just a special kind of module. Like all Perl | |
81 | modules, a source filter is invoked with a use statement. | |
82 | ||
83 | Say you want to pass your Perl source through the C preprocessor before | |
4c84d7f2 RGS |
84 | execution. As it happens, the source filters distribution comes with a C |
85 | preprocessor filter module called Filter::cpp. | |
c7c04614 GS |
86 | |
87 | Below is an example program, C<cpp_test>, which makes use of this filter. | |
88 | Line numbers have been added to allow specific lines to be referenced | |
89 | easily. | |
90 | ||
4358a253 | 91 | 1: use Filter::cpp; |
c7c04614 | 92 | 2: #define TRUE 1 |
4358a253 SS |
93 | 3: $a = TRUE; |
94 | 4: print "a = $a\n"; | |
c7c04614 GS |
95 | |
96 | When you execute this script, Perl creates a source stream for the | |
97 | file. Before the parser processes any of the lines from the file, the | |
98 | source stream looks like this: | |
99 | ||
100 | cpp_test ---------> parser | |
101 | ||
102 | Line 1, C<use Filter::cpp>, includes and installs the C<cpp> filter | |
103 | module. All source filters work this way. The use statement is compiled | |
104 | and executed at compile time, before any more of the file is read, and | |
105 | it attaches the cpp filter to the source stream behind the scenes. Now | |
106 | the data flow looks like this: | |
107 | ||
108 | cpp_test ----> cpp filter ----> parser | |
109 | ||
110 | As the parser reads the second and subsequent lines from the source | |
111 | stream, it feeds those lines through the C<cpp> source filter before | |
112 | processing them. The C<cpp> filter simply passes each line through the | |
113 | real C preprocessor. The output from the C preprocessor is then | |
114 | inserted back into the source stream by the filter. | |
115 | ||
116 | .-> cpp --. | |
117 | | | | |
118 | | | | |
119 | | <-' | |
120 | cpp_test ----> cpp filter ----> parser | |
121 | ||
122 | The parser then sees the following code: | |
123 | ||
4358a253 SS |
124 | use Filter::cpp; |
125 | $a = 1; | |
126 | print "a = $a\n"; | |
c7c04614 GS |
127 | |
128 | Let's consider what happens when the filtered code includes another | |
129 | module with use: | |
130 | ||
4358a253 | 131 | 1: use Filter::cpp; |
c7c04614 | 132 | 2: #define TRUE 1 |
4358a253 SS |
133 | 3: use Fred; |
134 | 4: $a = TRUE; | |
135 | 5: print "a = $a\n"; | |
c7c04614 GS |
136 | |
137 | The C<cpp> filter does not apply to the text of the Fred module, only | |
138 | to the text of the file that used it (C<cpp_test>). Although the use | |
139 | statement on line 3 will pass through the cpp filter, the module that | |
140 | gets included (C<Fred>) will not. The source streams look like this | |
141 | after line 3 has been parsed and before line 4 is parsed: | |
142 | ||
143 | cpp_test ---> cpp filter ---> parser (INACTIVE) | |
144 | ||
145 | Fred.pm ----> parser | |
146 | ||
147 | As you can see, a new stream has been created for reading the source | |
148 | from C<Fred.pm>. This stream will remain active until all of C<Fred.pm> | |
149 | has been parsed. The source stream for C<cpp_test> will still exist, | |
150 | but is inactive. Once the parser has finished reading Fred.pm, the | |
151 | source stream associated with it will be destroyed. The source stream | |
152 | for C<cpp_test> then becomes active again and the parser reads line 4 | |
153 | and subsequent lines from C<cpp_test>. | |
154 | ||
155 | You can use more than one source filter on a single file. Similarly, | |
156 | you can reuse the same filter in as many files as you like. | |
157 | ||
158 | For example, if you have a uuencoded and compressed source file, it is | |
159 | possible to stack a uudecode filter and an uncompression filter like | |
160 | this: | |
161 | ||
4358a253 | 162 | use Filter::uudecode; use Filter::uncompress; |
c7c04614 GS |
163 | M'XL(".H<US4''V9I;F%L')Q;>7/;1I;_>_I3=&E=%:F*I"T?22Q/ |
164 | M6]9*<IQCO*XFT"0[PL%%'Y+IG?WN^ZYN-$'J.[.JE$,20/?K=_[> | |
165 | ... | |
166 | ||
167 | Once the first line has been processed, the flow will look like this: | |
168 | ||
169 | file ---> uudecode ---> uncompress ---> parser | |
170 | filter filter | |
171 | ||
172 | Data flows through filters in the same order they appear in the source | |
173 | file. The uudecode filter appeared before the uncompress filter, so the | |
174 | source file will be uudecoded before it's uncompressed. | |
175 | ||
176 | =head1 WRITING A SOURCE FILTER | |
177 | ||
178 | There are three ways to write your own source filter. You can write it | |
179 | in C, use an external program as a filter, or write the filter in Perl. | |
180 | I won't cover the first two in any great detail, so I'll get them out | |
181 | of the way first. Writing the filter in Perl is most convenient, so | |
182 | I'll devote the most space to it. | |
183 | ||
184 | =head1 WRITING A SOURCE FILTER IN C | |
185 | ||
186 | The first of the three available techniques is to write the filter | |
187 | completely in C. The external module you create interfaces directly | |
188 | with the source filter hooks provided by Perl. | |
189 | ||
190 | The advantage of this technique is that you have complete control over | |
191 | the implementation of your filter. The big disadvantage is the | |
192 | increased complexity required to write the filter - not only do you | |
193 | need to understand the source filter hooks, but you also need a | |
194 | reasonable knowledge of Perl guts. One of the few times it is worth | |
195 | going to this trouble is when writing a source scrambler. The | |
196 | C<decrypt> filter (which unscrambles the source before Perl parses it) | |
197 | included with the source filter distribution is an example of a C | |
198 | source filter (see Decryption Filters, below). | |
c47ff5f1 | 199 | |
c7c04614 GS |
200 | |
201 | =over 5 | |
202 | ||
203 | =item B<Decryption Filters> | |
204 | ||
205 | All decryption filters work on the principle of "security through | |
206 | obscurity." Regardless of how well you write a decryption filter and | |
e8c6d8ab | 207 | how strong your encryption algorithm is, anyone determined enough can |
c7c04614 GS |
208 | retrieve the original source code. The reason is quite simple - once |
209 | the decryption filter has decrypted the source back to its original | |
210 | form, fragments of it will be stored in the computer's memory as Perl | |
211 | parses it. The source might only be in memory for a short period of | |
212 | time, but anyone possessing a debugger, skill, and lots of patience can | |
213 | eventually reconstruct your program. | |
214 | ||
215 | That said, there are a number of steps that can be taken to make life | |
216 | difficult for the potential cracker. The most important: Write your | |
217 | decryption filter in C and statically link the decryption module into | |
218 | the Perl binary. For further tips to make life difficult for the | |
219 | potential cracker, see the file I<decrypt.pm> in the source filters | |
e8c6d8ab | 220 | distribution. |
c7c04614 GS |
221 | |
222 | =back | |
223 | ||
224 | =head1 CREATING A SOURCE FILTER AS A SEPARATE EXECUTABLE | |
225 | ||
226 | An alternative to writing the filter in C is to create a separate | |
227 | executable in the language of your choice. The separate executable | |
228 | reads from standard input, does whatever processing is necessary, and | |
e8c6d8ab | 229 | writes the filtered data to standard output. C<Filter::cpp> is an |
c7c04614 GS |
230 | example of a source filter implemented as a separate executable - the |
231 | executable is the C preprocessor bundled with your C compiler. | |
232 | ||
233 | The source filter distribution includes two modules that simplify this | |
234 | task: C<Filter::exec> and C<Filter::sh>. Both allow you to run any | |
235 | external executable. Both use a coprocess to control the flow of data | |
236 | into and out of the external executable. (For details on coprocesses, | |
e8c6d8ab | 237 | see Stephens, W.R., "Advanced Programming in the UNIX Environment." |
c7c04614 GS |
238 | Addison-Wesley, ISBN 0-210-56317-7, pages 441-445.) The difference |
239 | between them is that C<Filter::exec> spawns the external command | |
240 | directly, while C<Filter::sh> spawns a shell to execute the external | |
241 | command. (Unix uses the Bourne shell; NT uses the cmd shell.) Spawning | |
242 | a shell allows you to make use of the shell metacharacters and | |
243 | redirection facilities. | |
244 | ||
245 | Here is an example script that uses C<Filter::sh>: | |
246 | ||
4358a253 SS |
247 | use Filter::sh 'tr XYZ PQR'; |
248 | $a = 1; | |
249 | print "XYZ a = $a\n"; | |
c7c04614 GS |
250 | |
251 | The output you'll get when the script is executed: | |
252 | ||
253 | PQR a = 1 | |
254 | ||
255 | Writing a source filter as a separate executable works fine, but a | |
256 | small performance penalty is incurred. For example, if you execute the | |
257 | small example above, a separate subprocess will be created to run the | |
258 | Unix C<tr> command. Each use of the filter requires its own subprocess. | |
259 | If creating subprocesses is expensive on your system, you might want to | |
260 | consider one of the other options for creating source filters. | |
261 | ||
262 | =head1 WRITING A SOURCE FILTER IN PERL | |
263 | ||
264 | The easiest and most portable option available for creating your own | |
265 | source filter is to write it completely in Perl. To distinguish this | |
266 | from the previous two techniques, I'll call it a Perl source filter. | |
267 | ||
268 | To help understand how to write a Perl source filter we need an example | |
269 | to study. Here is a complete source filter that performs rot13 | |
270 | decoding. (Rot13 is a very simple encryption scheme used in Usenet | |
271 | postings to hide the contents of offensive posts. It moves every letter | |
272 | forward thirteen places, so that A becomes N, B becomes O, and Z | |
273 | becomes M.) | |
274 | ||
275 | ||
4358a253 | 276 | package Rot13; |
c7c04614 | 277 | |
4358a253 | 278 | use Filter::Util::Call; |
c7c04614 GS |
279 | |
280 | sub import { | |
4358a253 SS |
281 | my ($type) = @_; |
282 | my ($ref) = []; | |
283 | filter_add(bless $ref); | |
c7c04614 GS |
284 | } |
285 | ||
286 | sub filter { | |
4358a253 SS |
287 | my ($self) = @_; |
288 | my ($status); | |
c7c04614 GS |
289 | |
290 | tr/n-za-mN-ZA-M/a-zA-Z/ | |
4358a253 SS |
291 | if ($status = filter_read()) > 0; |
292 | $status; | |
c7c04614 GS |
293 | } |
294 | ||
295 | 1; | |
296 | ||
297 | All Perl source filters are implemented as Perl classes and have the | |
298 | same basic structure as the example above. | |
299 | ||
300 | First, we include the C<Filter::Util::Call> module, which exports a | |
301 | number of functions into your filter's namespace. The filter shown | |
302 | above uses two of these functions, C<filter_add()> and | |
303 | C<filter_read()>. | |
304 | ||
305 | Next, we create the filter object and associate it with the source | |
306 | stream by defining the C<import> function. If you know Perl well | |
307 | enough, you know that C<import> is called automatically every time a | |
308 | module is included with a use statement. This makes C<import> the ideal | |
309 | place to both create and install a filter object. | |
310 | ||
311 | In the example filter, the object (C<$ref>) is blessed just like any | |
312 | other Perl object. Our example uses an anonymous array, but this isn't | |
313 | a requirement. Because this example doesn't need to store any context | |
314 | information, we could have used a scalar or hash reference just as | |
315 | well. The next section demonstrates context data. | |
316 | ||
317 | The association between the filter object and the source stream is made | |
318 | with the C<filter_add()> function. This takes a filter object as a | |
319 | parameter (C<$ref> in this case) and installs it in the source stream. | |
320 | ||
321 | Finally, there is the code that actually does the filtering. For this | |
322 | type of Perl source filter, all the filtering is done in a method | |
323 | called C<filter()>. (It is also possible to write a Perl source filter | |
324 | using a closure. See the C<Filter::Util::Call> manual page for more | |
325 | details.) It's called every time the Perl parser needs another line of | |
326 | source to process. The C<filter()> method, in turn, reads lines from | |
327 | the source stream using the C<filter_read()> function. | |
328 | ||
329 | If a line was available from the source stream, C<filter_read()> | |
330 | returns a status value greater than zero and appends the line to C<$_>. | |
331 | A status value of zero indicates end-of-file, less than zero means an | |
332 | error. The filter function itself is expected to return its status in | |
333 | the same way, and put the filtered line it wants written to the source | |
334 | stream in C<$_>. The use of C<$_> accounts for the brevity of most Perl | |
335 | source filters. | |
336 | ||
337 | In order to make use of the rot13 filter we need some way of encoding | |
338 | the source file in rot13 format. The script below, C<mkrot13>, does | |
339 | just that. | |
340 | ||
4358a253 SS |
341 | die "usage mkrot13 filename\n" unless @ARGV; |
342 | my $in = $ARGV[0]; | |
343 | my $out = "$in.tmp"; | |
c7c04614 GS |
344 | open(IN, "<$in") or die "Cannot open file $in: $!\n"; |
345 | open(OUT, ">$out") or die "Cannot open file $out: $!\n"; | |
346 | ||
4358a253 | 347 | print OUT "use Rot13;\n"; |
c7c04614 | 348 | while (<IN>) { |
4358a253 SS |
349 | tr/a-zA-Z/n-za-mN-ZA-M/; |
350 | print OUT; | |
c7c04614 GS |
351 | } |
352 | ||
353 | close IN; | |
354 | close OUT; | |
355 | unlink $in; | |
356 | rename $out, $in; | |
357 | ||
358 | If we encrypt this with C<mkrot13>: | |
359 | ||
4358a253 | 360 | print " hello fred \n"; |
c7c04614 GS |
361 | |
362 | the result will be this: | |
363 | ||
364 | use Rot13; | |
4358a253 | 365 | cevag "uryyb serq\a"; |
c7c04614 GS |
366 | |
367 | Running it produces this output: | |
368 | ||
369 | hello fred | |
370 | ||
371 | =head1 USING CONTEXT: THE DEBUG FILTER | |
372 | ||
373 | The rot13 example was a trivial example. Here's another demonstration | |
374 | that shows off a few more features. | |
375 | ||
376 | Say you wanted to include a lot of debugging code in your Perl script | |
377 | during development, but you didn't want it available in the released | |
378 | product. Source filters offer a solution. In order to keep the example | |
379 | simple, let's say you wanted the debugging output to be controlled by | |
380 | an environment variable, C<DEBUG>. Debugging code is enabled if the | |
381 | variable exists, otherwise it is disabled. | |
382 | ||
383 | Two special marker lines will bracket debugging code, like this: | |
384 | ||
385 | ## DEBUG_BEGIN | |
386 | if ($year > 1999) { | |
4358a253 | 387 | warn "Debug: millennium bug in year $year\n"; |
c7c04614 GS |
388 | } |
389 | ## DEBUG_END | |
390 | ||
e8c6d8ab FC |
391 | The filter ensures that Perl parses the code between the <DEBUG_BEGIN> |
392 | and C<DEBUG_END> markers only when the C<DEBUG> environment variable | |
393 | exists. That means that when C<DEBUG> does exist, the code above | |
c7c04614 GS |
394 | should be passed through the filter unchanged. The marker lines can |
395 | also be passed through as-is, because the Perl parser will see them as | |
396 | comment lines. When C<DEBUG> isn't set, we need a way to disable the | |
397 | debug code. A simple way to achieve that is to convert the lines | |
398 | between the two markers into comments: | |
399 | ||
400 | ## DEBUG_BEGIN | |
401 | #if ($year > 1999) { | |
4358a253 | 402 | # warn "Debug: millennium bug in year $year\n"; |
c7c04614 GS |
403 | #} |
404 | ## DEBUG_END | |
405 | ||
406 | Here is the complete Debug filter: | |
407 | ||
408 | package Debug; | |
409 | ||
410 | use strict; | |
9f1b1f2d | 411 | use warnings; |
4358a253 | 412 | use Filter::Util::Call; |
c7c04614 | 413 | |
4358a253 SS |
414 | use constant TRUE => 1; |
415 | use constant FALSE => 0; | |
c7c04614 GS |
416 | |
417 | sub import { | |
4358a253 | 418 | my ($type) = @_; |
c7c04614 GS |
419 | my (%context) = ( |
420 | Enabled => defined $ENV{DEBUG}, | |
421 | InTraceBlock => FALSE, | |
422 | Filename => (caller)[1], | |
423 | LineNo => 0, | |
424 | LastBegin => 0, | |
4358a253 SS |
425 | ); |
426 | filter_add(bless \%context); | |
c7c04614 GS |
427 | } |
428 | ||
429 | sub Die { | |
4358a253 SS |
430 | my ($self) = shift; |
431 | my ($message) = shift; | |
432 | my ($line_no) = shift || $self->{LastBegin}; | |
c7c04614 GS |
433 | die "$message at $self->{Filename} line $line_no.\n" |
434 | } | |
435 | ||
436 | sub filter { | |
4358a253 SS |
437 | my ($self) = @_; |
438 | my ($status); | |
439 | $status = filter_read(); | |
440 | ++ $self->{LineNo}; | |
c7c04614 GS |
441 | |
442 | # deal with EOF/error first | |
443 | if ($status <= 0) { | |
444 | $self->Die("DEBUG_BEGIN has no DEBUG_END") | |
4358a253 SS |
445 | if $self->{InTraceBlock}; |
446 | return $status; | |
c7c04614 GS |
447 | } |
448 | ||
449 | if ($self->{InTraceBlock}) { | |
450 | if (/^\s*##\s*DEBUG_BEGIN/ ) { | |
451 | $self->Die("Nested DEBUG_BEGIN", $self->{LineNo}) | |
452 | } elsif (/^\s*##\s*DEBUG_END/) { | |
4358a253 | 453 | $self->{InTraceBlock} = FALSE; |
c7c04614 GS |
454 | } |
455 | ||
456 | # comment out the debug lines when the filter is disabled | |
4358a253 | 457 | s/^/#/ if ! $self->{Enabled}; |
c7c04614 | 458 | } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) { |
4358a253 SS |
459 | $self->{InTraceBlock} = TRUE; |
460 | $self->{LastBegin} = $self->{LineNo}; | |
c7c04614 GS |
461 | } elsif ( /^\s*##\s*DEBUG_END/ ) { |
462 | $self->Die("DEBUG_END has no DEBUG_BEGIN", $self->{LineNo}); | |
463 | } | |
4358a253 | 464 | return $status; |
c7c04614 GS |
465 | } |
466 | ||
4358a253 | 467 | 1; |
c7c04614 GS |
468 | |
469 | The big difference between this filter and the previous example is the | |
470 | use of context data in the filter object. The filter object is based on | |
471 | a hash reference, and is used to keep various pieces of context | |
472 | information between calls to the filter function. All but two of the | |
473 | hash fields are used for error reporting. The first of those two, | |
474 | Enabled, is used by the filter to determine whether the debugging code | |
475 | should be given to the Perl parser. The second, InTraceBlock, is true | |
476 | when the filter has encountered a C<DEBUG_BEGIN> line, but has not yet | |
477 | encountered the following C<DEBUG_END> line. | |
478 | ||
479 | If you ignore all the error checking that most of the code does, the | |
480 | essence of the filter is as follows: | |
481 | ||
482 | sub filter { | |
4358a253 SS |
483 | my ($self) = @_; |
484 | my ($status); | |
485 | $status = filter_read(); | |
c7c04614 GS |
486 | |
487 | # deal with EOF/error first | |
4358a253 | 488 | return $status if $status <= 0; |
c7c04614 GS |
489 | if ($self->{InTraceBlock}) { |
490 | if (/^\s*##\s*DEBUG_END/) { | |
491 | $self->{InTraceBlock} = FALSE | |
492 | } | |
493 | ||
494 | # comment out debug lines when the filter is disabled | |
4358a253 | 495 | s/^/#/ if ! $self->{Enabled}; |
c7c04614 | 496 | } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) { |
4358a253 | 497 | $self->{InTraceBlock} = TRUE; |
c7c04614 | 498 | } |
4358a253 | 499 | return $status; |
c7c04614 GS |
500 | } |
501 | ||
502 | Be warned: just as the C-preprocessor doesn't know C, the Debug filter | |
503 | doesn't know Perl. It can be fooled quite easily: | |
504 | ||
505 | print <<EOM; | |
506 | ##DEBUG_BEGIN | |
507 | EOM | |
508 | ||
509 | Such things aside, you can see that a lot can be achieved with a modest | |
40b7eeef | 510 | amount of code. |
c7c04614 GS |
511 | |
512 | =head1 CONCLUSION | |
513 | ||
514 | You now have better understanding of what a source filter is, and you | |
515 | might even have a possible use for them. If you feel like playing with | |
516 | source filters but need a bit of inspiration, here are some extra | |
517 | features you could add to the Debug filter. | |
518 | ||
519 | First, an easy one. Rather than having debugging code that is | |
520 | all-or-nothing, it would be much more useful to be able to control | |
521 | which specific blocks of debugging code get included. Try extending the | |
522 | syntax for debug blocks to allow each to be identified. The contents of | |
523 | the C<DEBUG> environment variable can then be used to control which | |
524 | blocks get included. | |
525 | ||
526 | Once you can identify individual blocks, try allowing them to be | |
527 | nested. That isn't difficult either. | |
528 | ||
d1be9408 | 529 | Here is an interesting idea that doesn't involve the Debug filter. |
c7c04614 GS |
530 | Currently Perl subroutines have fairly limited support for formal |
531 | parameter lists. You can specify the number of parameters and their | |
532 | type, but you still have to manually take them out of the C<@_> array | |
533 | yourself. Write a source filter that allows you to have a named | |
534 | parameter list. Such a filter would turn this: | |
535 | ||
536 | sub MySub ($first, $second, @rest) { ... } | |
537 | ||
538 | into this: | |
539 | ||
540 | sub MySub($$@) { | |
4358a253 SS |
541 | my ($first) = shift; |
542 | my ($second) = shift; | |
543 | my (@rest) = @_; | |
c7c04614 GS |
544 | ... |
545 | } | |
546 | ||
547 | Finally, if you feel like a real challenge, have a go at writing a | |
548 | full-blown Perl macro preprocessor as a source filter. Borrow the | |
549 | useful features from the C preprocessor and any other macro processors | |
550 | you know. The tricky bit will be choosing how much knowledge of Perl's | |
551 | syntax you want your filter to have. | |
552 | ||
f686c54e CBW |
553 | =head1 LIMITATIONS |
554 | ||
555 | Source filters only work on the string level, thus are highly limited | |
556 | in its ability to change source code on the fly. It cannot detect | |
557 | comments, quoted strings, heredocs, it is no replacement for a real | |
558 | parser. | |
559 | The only stable usage for source filters are encryption, compression, | |
560 | or the byteloader, to translate binary code back to source code. | |
561 | ||
356231b0 | 562 | See for example the limitations in L<Switch>, which uses source filters, |
f686c54e | 563 | and thus is does not work inside a string eval, the presence of |
356231b0 SH |
564 | regexes with embedded newlines that are specified with raw C</.../> |
565 | delimiters and don't have a modifier C<//x> are indistinguishable from | |
566 | code chunks beginning with the division operator C</>. As a workaround | |
567 | you must use C<m/.../> or C<m?...?> for such patterns. Also, the presence of | |
568 | regexes specified with raw C<?...?> delimiters may cause mysterious | |
569 | errors. The workaround is to use C<m?...?> instead. See | |
570 | L<http://search.cpan.org/perldoc?Switch#LIMITATIONS> | |
571 | ||
572 | Currently the content of the C<__DATA__> block is not filtered. | |
f686c54e CBW |
573 | |
574 | Currently internal buffer lengths are limited to 32-bit only. | |
575 | ||
576 | ||
6c3397db CW |
577 | =head1 THINGS TO LOOK OUT FOR |
578 | ||
579 | =over 5 | |
580 | ||
581 | =item Some Filters Clobber the C<DATA> Handle | |
582 | ||
583 | Some source filters use the C<DATA> handle to read the calling program. | |
584 | When using these source filters you cannot rely on this handle, nor expect | |
585 | any particular kind of behavior when operating on it. Filters based on | |
586 | Filter::Util::Call (and therefore Filter::Simple) do not alter the C<DATA> | |
356231b0 | 587 | filehandle, but on the other hand totally ignore the text after C<__DATA__>. |
6c3397db CW |
588 | |
589 | =back | |
590 | ||
c7c04614 GS |
591 | =head1 REQUIREMENTS |
592 | ||
593 | The Source Filters distribution is available on CPAN, in | |
594 | ||
595 | CPAN/modules/by-module/Filter | |
596 | ||
83df6a1d JH |
597 | Starting from Perl 5.8 Filter::Util::Call (the core part of the |
598 | Source Filters distribution) is part of the standard Perl distribution. | |
599 | Also included is a friendlier interface called Filter::Simple, by | |
600 | Damian Conway. | |
601 | ||
c7c04614 GS |
602 | =head1 AUTHOR |
603 | ||
604 | Paul Marquess E<lt>Paul.Marquess@btinternet.comE<gt> | |
605 | ||
356231b0 SH |
606 | Reini Urban E<lt>rurban@cpan.orgE<gt> |
607 | ||
c7c04614 GS |
608 | =head1 Copyrights |
609 | ||
356231b0 SH |
610 | The first version of this article originally appeared in The Perl |
611 | Journal #11, and is copyright 1998 The Perl Journal. It appears | |
612 | courtesy of Jon Orwant and The Perl Journal. This document may be | |
613 | distributed under the same terms as Perl itself. |