Commit | Line | Data |
---|---|---|
54a137f5 GS |
1 | =head1 NAME |
2 | ||
3 | perlcompile - Introduction to the Perl Compiler-Translator | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | Perl has always had a compiler: your source is compiled into an | |
8 | internal form (a parse tree) which is then optimized before being | |
9 | run. Since version 5.005, Perl has shipped with a module | |
10 | capable of inspecting the optimized parse tree (C<B>), and this has | |
11 | been used to write many useful utilities, including a module that lets | |
12 | you turn your Perl into C source code that can be compiled into an | |
13 | native executable. | |
14 | ||
15 | The C<B> module provides access to the parse tree, and other modules | |
16 | ("back ends") do things with the tree. Some write it out as | |
17 | bytecode, C source code, or a semi-human-readable text. Another | |
18 | traverses the parse tree to build a cross-reference of which | |
19 | subroutines, formats, and variables are used where. Another checks | |
20 | your code for dubious constructs. Yet another back end dumps the | |
21 | parse tree back out as Perl source, acting as a source code beautifier | |
22 | or deobfuscator. | |
23 | ||
24 | Because its original purpose was to be a way to produce C code | |
25 | corresponding to a Perl program, and in turn a native executable, the | |
26 | C<B> module and its associated back ends are known as "the | |
27 | compiler", even though they don't really compile anything. | |
28 | Different parts of the compiler are more accurately a "translator", | |
29 | or an "inspector", but people want Perl to have a "compiler | |
30 | option" not an "inspector gadget". What can you do? | |
31 | ||
32 | This document covers the use of the Perl compiler: which modules | |
33 | it comprises, how to use the most important of the back end modules, | |
34 | what problems there are, and how to work around them. | |
35 | ||
36 | =head2 Layout | |
37 | ||
38 | The compiler back ends are in the C<B::> hierarchy, and the front-end | |
39 | (the module that you, the user of the compiler, will sometimes | |
40 | interact with) is the O module. Some back ends (e.g., C<B::C>) have | |
41 | programs (e.g., I<perlcc>) to hide the modules' complexity. | |
42 | ||
43 | Here are the important back ends to know about, with their status | |
44 | expressed as a number from 0 (outline for later implementation) to | |
45 | 10 (if there's a bug in it, we're very surprised): | |
46 | ||
47 | =over 4 | |
48 | ||
49 | =item B::Bytecode | |
50 | ||
51 | Stores the parse tree in a machine-independent format, suitable | |
52 | for later reloading through the ByteLoader module. Status: 5 (some | |
53 | things work, some things don't, some things are untested). | |
54 | ||
55 | =item B::C | |
56 | ||
57 | Creates a C source file containing code to rebuild the parse tree | |
58 | and resume the interpreter. Status: 6 (many things work adequately, | |
59 | including programs using Tk). | |
60 | ||
61 | =item B::CC | |
62 | ||
63 | Creates a C source file corresponding to the run time code path in | |
64 | the parse tree. This is the closest to a Perl-to-C translator there | |
65 | is, but the code it generates is almost incomprehensible because it | |
66 | translates the parse tree into a giant switch structure that | |
67 | manipulates Perl structures. Eventual goal is to reduce (given | |
68 | sufficient type information in the Perl program) some of the | |
69 | Perl data structure manipulations into manipulations of C-level | |
70 | ints, floats, etc. Status: 5 (some things work, including | |
71 | uncomplicated Tk examples). | |
72 | ||
73 | =item B::Lint | |
74 | ||
75 | Complains if it finds dubious constructs in your source code. Status: | |
76 | 6 (it works adequately, but only has a very limited number of areas | |
77 | that it checks). | |
78 | ||
79 | =item B::Deparse | |
80 | ||
81 | Recreates the Perl source, making an attempt to format it coherently. | |
82 | Status: 8 (it works nicely, but a few obscure things are missing). | |
83 | ||
84 | =item B::Xref | |
85 | ||
86 | Reports on the declaration and use of subroutines and variables. | |
87 | Status: 8 (it works nicely, but still has a few lingering bugs). | |
88 | ||
89 | =back | |
90 | ||
91 | =head1 Using The Back Ends | |
92 | ||
93 | The following sections describe how to use the various compiler back | |
94 | ends. They're presented roughly in order of maturity, so that the | |
95 | most stable and proven back ends are described first, and the most | |
96 | experimental and incomplete back ends are described last. | |
97 | ||
98 | The O module automatically enabled the B<-c> flag to Perl, which | |
99 | prevents Perl from executing your code once it has been compiled. | |
100 | This is why all the back ends print: | |
101 | ||
102 | myperlprogram syntax OK | |
103 | ||
104 | before producing any other output. | |
105 | ||
106 | =head2 The Cross Referencing Back End (B::Xref) | |
107 | ||
108 | The cross referencing back end produces a report on your program, | |
109 | breaking down declarations and uses of subroutines and variables (and | |
110 | formats) by file and subroutine. For instance, here's part of the | |
111 | report from the I<pod2man> program that comes with Perl: | |
112 | ||
113 | Subroutine clear_noremap | |
114 | Package (lexical) | |
115 | $ready_to_print i1069, 1079 | |
116 | Package main | |
117 | $& 1086 | |
118 | $. 1086 | |
119 | $0 1086 | |
120 | $1 1087 | |
121 | $2 1085, 1085 | |
122 | $3 1085, 1085 | |
123 | $ARGV 1086 | |
124 | %HTML_Escapes 1085, 1085 | |
125 | ||
126 | This shows the variables used in the subroutine C<clear_noremap>. The | |
127 | variable C<$ready_to_print> is a my() (lexical) variable, | |
128 | B<i>ntroduced (first declared with my()) on line 1069, and used on | |
129 | line 1079. The variable C<$&> from the main package is used on 1086, | |
130 | and so on. | |
131 | ||
132 | A line number may be prefixed by a single letter: | |
133 | ||
134 | =over 4 | |
135 | ||
136 | =item i | |
137 | ||
138 | Lexical variable introduced (declared with my()) for the first time. | |
139 | ||
140 | =item & | |
141 | ||
142 | Subroutine or method call. | |
143 | ||
144 | =item s | |
145 | ||
146 | Subroutine defined. | |
147 | ||
148 | =item r | |
149 | ||
150 | Format defined. | |
151 | ||
152 | =back | |
153 | ||
154 | The most useful option the cross referencer has is to save the report | |
155 | to a separate file. For instance, to save the report on | |
156 | I<myperlprogram> to the file I<report>: | |
157 | ||
158 | $ perl -MO=Xref,-oreport myperlprogram | |
159 | ||
160 | =head2 The Decompiling Back End | |
161 | ||
162 | The Deparse back end turns your Perl source back into Perl source. It | |
163 | can reformat along the way, making it useful as a de-obfuscator. The | |
164 | most basic way to use it is: | |
165 | ||
166 | $ perl -MO=Deparse myperlprogram | |
167 | ||
168 | You'll notice immediately that Perl has no idea of how to paragraph | |
169 | your code. You'll have to separate chunks of code from each other | |
170 | with newlines by hand. However, watch what it will do with | |
171 | one-liners: | |
172 | ||
173 | $ perl -MO=Deparse -e '$op=shift||die "usage: $0 | |
174 | code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op; | |
175 | die$@ if$@; rename$was,$_ unless$was eq $_}' | |
176 | -e syntax OK | |
177 | $op = shift @ARGV || die("usage: $0 code [...]"); | |
178 | chomp(@ARGV = <ARGV>) unless @ARGV; | |
179 | foreach $_ (@ARGV) { | |
180 | $was = $_; | |
181 | eval $op; | |
182 | die $@ if $@; | |
183 | rename $was, $_ unless $was eq $_; | |
184 | } | |
185 | ||
186 | (this is the I<rename> program that comes in the I<eg/> directory | |
187 | of the Perl source distribution). | |
188 | ||
189 | The decompiler has several options for the code it generates. For | |
190 | instance, you can set the size of each indent from 4 (as above) to | |
191 | 2 with: | |
192 | ||
193 | $ perl -MO=Deparse,-si2 myperlprogram | |
194 | ||
195 | The B<-p> option adds parentheses where normally they are omitted: | |
196 | ||
197 | $ perl -MO=Deparse -e 'print "Hello, world\n"' | |
198 | -e syntax OK | |
199 | print "Hello, world\n"; | |
200 | $ perl -MO=Deparse,-p -e 'print "Hello, world\n"' | |
201 | -e syntax OK | |
202 | print("Hello, world\n"); | |
203 | ||
204 | See L<B::Deparse> for more information on the formatting options. | |
205 | ||
206 | =head2 The Lint Back End (B::Lint) | |
207 | ||
208 | The lint back end inspects programs for poor style. One programmer's | |
209 | bad style is another programmer's useful tool, so options let you | |
210 | select what is complained about. | |
211 | ||
212 | To run the style checker across your source code: | |
213 | ||
214 | $ perl -MO=Lint myperlprogram | |
215 | ||
216 | To disable context checks and undefined subroutines: | |
217 | ||
218 | $ perl -MO=Lint,-context,-undefined-subs myperlprogram | |
219 | ||
220 | See L<B::Lint> for information on the options. | |
221 | ||
222 | =head2 The Simple C Back End | |
223 | ||
224 | This module saves the internal compiled state of your Perl program | |
225 | to a C source file, which can be turned into a native executable | |
226 | for that particular platform using a C compiler. The resulting | |
227 | program links against the Perl interpreter library, so it | |
228 | will not save you disk space (unless you build Perl with a shared | |
229 | library) or program size. It may, however, save you startup time. | |
230 | ||
231 | The C<perlcc> tool generates such executables by default. | |
232 | ||
233 | perlcc myperlprogram.pl | |
234 | ||
235 | =head2 The Bytecode Back End | |
236 | ||
237 | This back end is only useful if you also have a way to load and | |
238 | execute the bytecode that it produces. The ByteLoader module provides | |
239 | this functionality. | |
240 | ||
241 | To turn a Perl program into executable byte code, you can use C<perlcc> | |
242 | with the C<-b> switch: | |
243 | ||
244 | perlcc -b myperlprogram.pl | |
245 | ||
246 | The byte code is machine independent, so once you have a compiled | |
247 | module or program, it is as portable as Perl source (assuming that | |
248 | the user of the module or program has a modern-enough Perl interpreter | |
249 | to decode the byte code). | |
250 | ||
251 | See B<B::Bytecode> for information on options to control the | |
252 | optimization and nature of the code generated by the Bytecode module. | |
253 | ||
254 | =head2 The Optimized C Back End | |
255 | ||
256 | The optimized C back end will turn your Perl program's run time | |
257 | code-path into an equivalent (but optimized) C program that manipulates | |
258 | the Perl data structures directly. The program will still link against | |
259 | the Perl interpreter library, to allow for eval(), C<s///e>, | |
260 | C<require>, etc. | |
261 | ||
262 | The C<perlcc> tool generates such executables when using the -opt | |
263 | switch. To compile a Perl program (ending in C<.pl> | |
264 | or C<.p>): | |
265 | ||
266 | perlcc -opt myperlprogram.pl | |
267 | ||
268 | To produce a shared library from a Perl module (ending in C<.pm>): | |
269 | ||
270 | perlcc -opt Myperlmodule.pm | |
271 | ||
272 | For more information, see L<perlcc> and L<B::CC>. | |
273 | ||
274 | =over 4 | |
275 | ||
276 | =item B | |
277 | ||
278 | This module is the introspective ("reflective" in Java terms) | |
279 | module, which allows a Perl program to inspect its innards. The | |
280 | back end modules all use this module to gain access to the compiled | |
281 | parse tree. You, the user of a back end module, will not need to | |
282 | interact with B. | |
283 | ||
284 | =item O | |
285 | ||
286 | This module is the front-end to the compiler's back ends. Normally | |
287 | called something like this: | |
288 | ||
289 | $ perl -MO=Deparse myperlprogram | |
290 | ||
291 | This is like saying C<use O 'Deparse'> in your Perl program. | |
292 | ||
293 | =item B::Asmdata | |
294 | ||
295 | This module is used by the B::Assembler module, which is in turn used | |
296 | by the B::Bytecode module, which stores a parse-tree as | |
297 | bytecode for later loading. It's not a back end itself, but rather a | |
298 | component of a back end. | |
299 | ||
300 | =item B::Assembler | |
301 | ||
302 | This module turns a parse-tree into data suitable for storing | |
303 | and later decoding back into a parse-tree. It's not a back end | |
304 | itself, but rather a component of a back end. It's used by the | |
305 | I<assemble> program that produces bytecode. | |
306 | ||
307 | =item B::Bblock | |
308 | ||
309 | This module is used by the B::CC back end. It walks "basic blocks", | |
310 | whatever they may be. | |
311 | ||
312 | =item B::Bytecode | |
313 | ||
314 | This module is a back end that generates bytecode from a | |
315 | program's parse tree. This bytecode is written to a file, from where | |
316 | it can later be reconstructed back into a parse tree. The goal is to | |
317 | do the expensive program compilation once, save the interpreter's | |
318 | state into a file, and then restore the state from the file when the | |
319 | program is to be executed. See L</"The Bytecode Back End"> | |
320 | for details about usage. | |
321 | ||
322 | =item B::C | |
323 | ||
324 | This module writes out C code corresponding to the parse tree and | |
325 | other interpreter internal structures. You compile the corresponding | |
326 | C file, and get an executable file that will restore the internal | |
327 | structures and the Perl interpreter will begin running the | |
328 | program. See L</"The Simple C Back End"> for details about usage. | |
329 | ||
330 | =item B::CC | |
331 | ||
332 | This module writes out C code corresponding to your program's | |
333 | operations. Unlike the B::C module, which merely stores the | |
334 | interpreter and its state in a C program, the B::CC module makes a | |
335 | C program that does not involve the interpreter. As a consequence, | |
336 | programs translated into C by B::CC can execute faster than normal | |
337 | interpreted programs. See L</"The Optimized C Back End"> for | |
338 | details about usage. | |
339 | ||
340 | =item B::Debug | |
341 | ||
342 | This module dumps the Perl parse tree in verbose detail to STDOUT. | |
343 | It's useful for people who are writing their own back end, or who | |
344 | are learning about the Perl internals. It's not useful to the | |
345 | average programmer. | |
346 | ||
347 | =item B::Deparse | |
348 | ||
349 | This module produces Perl source code from the compiled parse tree. | |
350 | It is useful in debugging and deconstructing other people's code, | |
351 | also as a pretty-printer for your own source. See | |
352 | L</"The Decompiling Back End"> for details about usage. | |
353 | ||
354 | =item B::Disassembler | |
355 | ||
356 | This module turns bytecode back into a parse tree. It's not a back | |
357 | end itself, but rather a component of a back end. It's used by the | |
358 | I<disassemble> program that comes with the bytecode. | |
359 | ||
360 | =item B::Lint | |
361 | ||
362 | This module inspects the compiled form of your source code for things | |
363 | which, while some people frown on them, aren't necessarily bad enough | |
364 | to justify a warning. For instance, use of an array in scalar context | |
365 | without explicitly saying C<scalar(@array)> is something that Lint | |
366 | can identify. See L</"The Lint Back End"> for details about usage. | |
367 | ||
368 | =item B::Showlex | |
369 | ||
370 | This module prints out the my() variables used in a function or a | |
371 | file. To gt a list of the my() variables used in the subroutine | |
372 | mysub() defined in the file myperlprogram: | |
373 | ||
374 | $ perl -MO=Showlex,mysub myperlprogram | |
375 | ||
376 | To gt a list of the my() variables used in the file myperlprogram: | |
377 | ||
378 | $ perl -MO=Showlex myperlprogram | |
379 | ||
380 | [BROKEN] | |
381 | ||
382 | =item B::Stackobj | |
383 | ||
384 | This module is used by the B::CC module. It's not a back end itself, | |
385 | but rather a component of a back end. | |
386 | ||
387 | =item B::Stash | |
388 | ||
389 | This module is used by the L<perlcc> program, which compiles a module | |
390 | into an executable. B::Stash prints the symbol tables in use by a | |
391 | program, and is used to prevent B::CC from producing C code for the | |
392 | B::* and O modules. It's not a back end itself, but rather a | |
393 | component of a back end. | |
394 | ||
395 | =item B::Terse | |
396 | ||
397 | This module prints the contents of the parse tree, but without as much | |
398 | information as B::Debug. For comparison, C<print "Hello, world."> | |
399 | produced 96 lines of output from B::Debug, but only 6 from B::Terse. | |
400 | ||
401 | This module is useful for people who are writing their own back end, | |
402 | or who are learning about the Perl internals. It's not useful to the | |
403 | average programmer. | |
404 | ||
405 | =item B::Xref | |
406 | ||
407 | This module prints a report on where the variables, subroutines, and | |
408 | formats are defined and used within a program and the modules it | |
409 | loads. See L</"The Cross Referencing Back End"> for details about | |
410 | usage. | |
411 | ||
412 | =cut | |
413 | ||
414 | =head1 KNOWN PROBLEMS | |
415 | ||
416 | The simple C backend currently only saves typeglobs with alphanumeric | |
417 | names. | |
418 | ||
419 | The optimized C backend outputs code for more modules than it should | |
420 | (e.g., DirHandle). It also has little hope of properly handling | |
421 | C<goto LABEL> outside the running subroutine (C<goto &sub> is ok). | |
422 | C<goto LABEL> currently does not work at all in this backend. | |
423 | It also creates a huge initialization function that gives | |
424 | C compilers headaches. Splitting the initialization function gives | |
425 | better results. Other problems include: unsigned math does not | |
426 | work correctly; some opcodes are handled incorrectly by default | |
427 | opcode handling mechanism. | |
428 | ||
429 | BEGIN{} blocks are executed while compiling your code. Any external | |
430 | state that is initialized in BEGIN{}, such as opening files, initiating | |
431 | database connections etc., do not behave properly. To work around | |
432 | this, Perl has an INIT{} block that corresponds to code being executed | |
433 | before your program begins running but after your program has finished | |
434 | being compiled. Execution order: BEGIN{}, (possible save of state | |
435 | through compiler back-end), INIT{}, program runs, END{}. | |
436 | ||
437 | =head1 AUTHOR | |
438 | ||
439 | This document was originally written by Nathan Torkington, and is now | |
440 | maintained by the perl5-porters mailing list | |
441 | I<perl5-porters@perl.org>. | |
442 | ||
443 | =cut |