Commit | Line | Data |
---|---|---|
54a137f5 GS |
1 | =head1 NAME |
2 | ||
3 | perlcompile - Introduction to the Perl Compiler-Translator | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | Perl has always had a compiler: your source is compiled into an | |
8 | internal form (a parse tree) which is then optimized before being | |
9 | run. Since version 5.005, Perl has shipped with a module | |
10 | capable of inspecting the optimized parse tree (C<B>), and this has | |
11 | been used to write many useful utilities, including a module that lets | |
d1be9408 | 12 | you turn your Perl into C source code that can be compiled into a |
54a137f5 GS |
13 | native executable. |
14 | ||
15 | The C<B> module provides access to the parse tree, and other modules | |
16 | ("back ends") do things with the tree. Some write it out as | |
17 | bytecode, C source code, or a semi-human-readable text. Another | |
18 | traverses the parse tree to build a cross-reference of which | |
19 | subroutines, formats, and variables are used where. Another checks | |
20 | your code for dubious constructs. Yet another back end dumps the | |
21 | parse tree back out as Perl source, acting as a source code beautifier | |
22 | or deobfuscator. | |
23 | ||
24 | Because its original purpose was to be a way to produce C code | |
25 | corresponding to a Perl program, and in turn a native executable, the | |
26 | C<B> module and its associated back ends are known as "the | |
27 | compiler", even though they don't really compile anything. | |
28 | Different parts of the compiler are more accurately a "translator", | |
29 | or an "inspector", but people want Perl to have a "compiler | |
30 | option" not an "inspector gadget". What can you do? | |
31 | ||
32 | This document covers the use of the Perl compiler: which modules | |
33 | it comprises, how to use the most important of the back end modules, | |
34 | what problems there are, and how to work around them. | |
35 | ||
36 | =head2 Layout | |
37 | ||
38 | The compiler back ends are in the C<B::> hierarchy, and the front-end | |
39 | (the module that you, the user of the compiler, will sometimes | |
40 | interact with) is the O module. Some back ends (e.g., C<B::C>) have | |
41 | programs (e.g., I<perlcc>) to hide the modules' complexity. | |
42 | ||
43 | Here are the important back ends to know about, with their status | |
44 | expressed as a number from 0 (outline for later implementation) to | |
45 | 10 (if there's a bug in it, we're very surprised): | |
46 | ||
47 | =over 4 | |
48 | ||
49 | =item B::Bytecode | |
50 | ||
51 | Stores the parse tree in a machine-independent format, suitable | |
52 | for later reloading through the ByteLoader module. Status: 5 (some | |
53 | things work, some things don't, some things are untested). | |
54 | ||
55 | =item B::C | |
56 | ||
57 | Creates a C source file containing code to rebuild the parse tree | |
58 | and resume the interpreter. Status: 6 (many things work adequately, | |
59 | including programs using Tk). | |
60 | ||
61 | =item B::CC | |
62 | ||
63 | Creates a C source file corresponding to the run time code path in | |
64 | the parse tree. This is the closest to a Perl-to-C translator there | |
65 | is, but the code it generates is almost incomprehensible because it | |
66 | translates the parse tree into a giant switch structure that | |
67 | manipulates Perl structures. Eventual goal is to reduce (given | |
68 | sufficient type information in the Perl program) some of the | |
69 | Perl data structure manipulations into manipulations of C-level | |
70 | ints, floats, etc. Status: 5 (some things work, including | |
71 | uncomplicated Tk examples). | |
72 | ||
73 | =item B::Lint | |
74 | ||
75 | Complains if it finds dubious constructs in your source code. Status: | |
76 | 6 (it works adequately, but only has a very limited number of areas | |
77 | that it checks). | |
78 | ||
79 | =item B::Deparse | |
80 | ||
81 | Recreates the Perl source, making an attempt to format it coherently. | |
82 | Status: 8 (it works nicely, but a few obscure things are missing). | |
83 | ||
84 | =item B::Xref | |
85 | ||
86 | Reports on the declaration and use of subroutines and variables. | |
87 | Status: 8 (it works nicely, but still has a few lingering bugs). | |
88 | ||
89 | =back | |
90 | ||
91 | =head1 Using The Back Ends | |
92 | ||
93 | The following sections describe how to use the various compiler back | |
94 | ends. They're presented roughly in order of maturity, so that the | |
95 | most stable and proven back ends are described first, and the most | |
96 | experimental and incomplete back ends are described last. | |
97 | ||
98 | The O module automatically enabled the B<-c> flag to Perl, which | |
99 | prevents Perl from executing your code once it has been compiled. | |
100 | This is why all the back ends print: | |
101 | ||
102 | myperlprogram syntax OK | |
103 | ||
104 | before producing any other output. | |
105 | ||
4a4eefd0 | 106 | =head2 The Cross Referencing Back End |
54a137f5 | 107 | |
4a4eefd0 | 108 | The cross referencing back end (B::Xref) produces a report on your program, |
54a137f5 GS |
109 | breaking down declarations and uses of subroutines and variables (and |
110 | formats) by file and subroutine. For instance, here's part of the | |
111 | report from the I<pod2man> program that comes with Perl: | |
112 | ||
113 | Subroutine clear_noremap | |
114 | Package (lexical) | |
115 | $ready_to_print i1069, 1079 | |
116 | Package main | |
117 | $& 1086 | |
118 | $. 1086 | |
119 | $0 1086 | |
120 | $1 1087 | |
121 | $2 1085, 1085 | |
122 | $3 1085, 1085 | |
123 | $ARGV 1086 | |
124 | %HTML_Escapes 1085, 1085 | |
125 | ||
126 | This shows the variables used in the subroutine C<clear_noremap>. The | |
127 | variable C<$ready_to_print> is a my() (lexical) variable, | |
128 | B<i>ntroduced (first declared with my()) on line 1069, and used on | |
129 | line 1079. The variable C<$&> from the main package is used on 1086, | |
130 | and so on. | |
131 | ||
132 | A line number may be prefixed by a single letter: | |
133 | ||
134 | =over 4 | |
135 | ||
136 | =item i | |
137 | ||
138 | Lexical variable introduced (declared with my()) for the first time. | |
139 | ||
140 | =item & | |
141 | ||
142 | Subroutine or method call. | |
143 | ||
144 | =item s | |
145 | ||
146 | Subroutine defined. | |
147 | ||
148 | =item r | |
149 | ||
150 | Format defined. | |
151 | ||
152 | =back | |
153 | ||
154 | The most useful option the cross referencer has is to save the report | |
155 | to a separate file. For instance, to save the report on | |
156 | I<myperlprogram> to the file I<report>: | |
157 | ||
158 | $ perl -MO=Xref,-oreport myperlprogram | |
159 | ||
160 | =head2 The Decompiling Back End | |
161 | ||
162 | The Deparse back end turns your Perl source back into Perl source. It | |
163 | can reformat along the way, making it useful as a de-obfuscator. The | |
164 | most basic way to use it is: | |
165 | ||
166 | $ perl -MO=Deparse myperlprogram | |
167 | ||
168 | You'll notice immediately that Perl has no idea of how to paragraph | |
169 | your code. You'll have to separate chunks of code from each other | |
170 | with newlines by hand. However, watch what it will do with | |
171 | one-liners: | |
172 | ||
173 | $ perl -MO=Deparse -e '$op=shift||die "usage: $0 | |
174 | code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op; | |
175 | die$@ if$@; rename$was,$_ unless$was eq $_}' | |
176 | -e syntax OK | |
177 | $op = shift @ARGV || die("usage: $0 code [...]"); | |
178 | chomp(@ARGV = <ARGV>) unless @ARGV; | |
179 | foreach $_ (@ARGV) { | |
180 | $was = $_; | |
181 | eval $op; | |
182 | die $@ if $@; | |
183 | rename $was, $_ unless $was eq $_; | |
184 | } | |
185 | ||
54a137f5 GS |
186 | The decompiler has several options for the code it generates. For |
187 | instance, you can set the size of each indent from 4 (as above) to | |
188 | 2 with: | |
189 | ||
190 | $ perl -MO=Deparse,-si2 myperlprogram | |
191 | ||
192 | The B<-p> option adds parentheses where normally they are omitted: | |
193 | ||
194 | $ perl -MO=Deparse -e 'print "Hello, world\n"' | |
195 | -e syntax OK | |
196 | print "Hello, world\n"; | |
197 | $ perl -MO=Deparse,-p -e 'print "Hello, world\n"' | |
198 | -e syntax OK | |
199 | print("Hello, world\n"); | |
200 | ||
201 | See L<B::Deparse> for more information on the formatting options. | |
202 | ||
4a4eefd0 | 203 | =head2 The Lint Back End |
54a137f5 | 204 | |
4a4eefd0 GS |
205 | The lint back end (B::Lint) inspects programs for poor style. One |
206 | programmer's bad style is another programmer's useful tool, so options | |
207 | let you select what is complained about. | |
54a137f5 GS |
208 | |
209 | To run the style checker across your source code: | |
210 | ||
211 | $ perl -MO=Lint myperlprogram | |
212 | ||
213 | To disable context checks and undefined subroutines: | |
214 | ||
215 | $ perl -MO=Lint,-context,-undefined-subs myperlprogram | |
216 | ||
217 | See L<B::Lint> for information on the options. | |
218 | ||
219 | =head2 The Simple C Back End | |
220 | ||
221 | This module saves the internal compiled state of your Perl program | |
222 | to a C source file, which can be turned into a native executable | |
223 | for that particular platform using a C compiler. The resulting | |
224 | program links against the Perl interpreter library, so it | |
225 | will not save you disk space (unless you build Perl with a shared | |
226 | library) or program size. It may, however, save you startup time. | |
227 | ||
228 | The C<perlcc> tool generates such executables by default. | |
229 | ||
230 | perlcc myperlprogram.pl | |
231 | ||
232 | =head2 The Bytecode Back End | |
233 | ||
234 | This back end is only useful if you also have a way to load and | |
235 | execute the bytecode that it produces. The ByteLoader module provides | |
236 | this functionality. | |
237 | ||
238 | To turn a Perl program into executable byte code, you can use C<perlcc> | |
d9ba819c | 239 | with the C<-B> switch: |
54a137f5 | 240 | |
d9ba819c | 241 | perlcc -B myperlprogram.pl |
54a137f5 GS |
242 | |
243 | The byte code is machine independent, so once you have a compiled | |
244 | module or program, it is as portable as Perl source (assuming that | |
245 | the user of the module or program has a modern-enough Perl interpreter | |
246 | to decode the byte code). | |
247 | ||
248 | See B<B::Bytecode> for information on options to control the | |
249 | optimization and nature of the code generated by the Bytecode module. | |
250 | ||
251 | =head2 The Optimized C Back End | |
252 | ||
253 | The optimized C back end will turn your Perl program's run time | |
254 | code-path into an equivalent (but optimized) C program that manipulates | |
255 | the Perl data structures directly. The program will still link against | |
256 | the Perl interpreter library, to allow for eval(), C<s///e>, | |
257 | C<require>, etc. | |
258 | ||
d9ba819c | 259 | The C<perlcc> tool generates such executables when using the -O |
54a137f5 GS |
260 | switch. To compile a Perl program (ending in C<.pl> |
261 | or C<.p>): | |
262 | ||
d9ba819c | 263 | perlcc -O myperlprogram.pl |
54a137f5 GS |
264 | |
265 | To produce a shared library from a Perl module (ending in C<.pm>): | |
266 | ||
d9ba819c | 267 | perlcc -O Myperlmodule.pm |
54a137f5 GS |
268 | |
269 | For more information, see L<perlcc> and L<B::CC>. | |
270 | ||
384e87d1 RGS |
271 | =head1 Module List for the Compiler Suite |
272 | ||
54a137f5 GS |
273 | =over 4 |
274 | ||
275 | =item B | |
276 | ||
277 | This module is the introspective ("reflective" in Java terms) | |
278 | module, which allows a Perl program to inspect its innards. The | |
279 | back end modules all use this module to gain access to the compiled | |
280 | parse tree. You, the user of a back end module, will not need to | |
281 | interact with B. | |
282 | ||
283 | =item O | |
284 | ||
285 | This module is the front-end to the compiler's back ends. Normally | |
286 | called something like this: | |
287 | ||
288 | $ perl -MO=Deparse myperlprogram | |
289 | ||
290 | This is like saying C<use O 'Deparse'> in your Perl program. | |
291 | ||
292 | =item B::Asmdata | |
293 | ||
294 | This module is used by the B::Assembler module, which is in turn used | |
295 | by the B::Bytecode module, which stores a parse-tree as | |
296 | bytecode for later loading. It's not a back end itself, but rather a | |
297 | component of a back end. | |
298 | ||
299 | =item B::Assembler | |
300 | ||
301 | This module turns a parse-tree into data suitable for storing | |
302 | and later decoding back into a parse-tree. It's not a back end | |
303 | itself, but rather a component of a back end. It's used by the | |
304 | I<assemble> program that produces bytecode. | |
305 | ||
306 | =item B::Bblock | |
307 | ||
200f06d0 GS |
308 | This module is used by the B::CC back end. It walks "basic blocks". |
309 | A basic block is a series of operations which is known to execute from | |
4375e838 | 310 | start to finish, with no possibility of branching or halting. |
54a137f5 GS |
311 | |
312 | =item B::Bytecode | |
313 | ||
314 | This module is a back end that generates bytecode from a | |
315 | program's parse tree. This bytecode is written to a file, from where | |
316 | it can later be reconstructed back into a parse tree. The goal is to | |
317 | do the expensive program compilation once, save the interpreter's | |
318 | state into a file, and then restore the state from the file when the | |
319 | program is to be executed. See L</"The Bytecode Back End"> | |
320 | for details about usage. | |
321 | ||
322 | =item B::C | |
323 | ||
324 | This module writes out C code corresponding to the parse tree and | |
325 | other interpreter internal structures. You compile the corresponding | |
326 | C file, and get an executable file that will restore the internal | |
327 | structures and the Perl interpreter will begin running the | |
328 | program. See L</"The Simple C Back End"> for details about usage. | |
329 | ||
330 | =item B::CC | |
331 | ||
332 | This module writes out C code corresponding to your program's | |
333 | operations. Unlike the B::C module, which merely stores the | |
334 | interpreter and its state in a C program, the B::CC module makes a | |
335 | C program that does not involve the interpreter. As a consequence, | |
336 | programs translated into C by B::CC can execute faster than normal | |
337 | interpreted programs. See L</"The Optimized C Back End"> for | |
338 | details about usage. | |
339 | ||
384e87d1 RGS |
340 | =item B::Concise |
341 | ||
342 | This module prints a concise (but complete) version of the Perl parse | |
343 | tree. Its output is more customizable than the one of B::Terse or | |
344 | B::Debug (and it can emulate them). This module useful for people who | |
345 | are writing their own back end, or who are learning about the Perl | |
346 | internals. It's not useful to the average programmer. | |
347 | ||
54a137f5 GS |
348 | =item B::Debug |
349 | ||
350 | This module dumps the Perl parse tree in verbose detail to STDOUT. | |
351 | It's useful for people who are writing their own back end, or who | |
352 | are learning about the Perl internals. It's not useful to the | |
353 | average programmer. | |
354 | ||
355 | =item B::Deparse | |
356 | ||
357 | This module produces Perl source code from the compiled parse tree. | |
358 | It is useful in debugging and deconstructing other people's code, | |
359 | also as a pretty-printer for your own source. See | |
360 | L</"The Decompiling Back End"> for details about usage. | |
361 | ||
362 | =item B::Disassembler | |
363 | ||
364 | This module turns bytecode back into a parse tree. It's not a back | |
365 | end itself, but rather a component of a back end. It's used by the | |
366 | I<disassemble> program that comes with the bytecode. | |
367 | ||
368 | =item B::Lint | |
369 | ||
370 | This module inspects the compiled form of your source code for things | |
371 | which, while some people frown on them, aren't necessarily bad enough | |
372 | to justify a warning. For instance, use of an array in scalar context | |
373 | without explicitly saying C<scalar(@array)> is something that Lint | |
374 | can identify. See L</"The Lint Back End"> for details about usage. | |
375 | ||
376 | =item B::Showlex | |
377 | ||
378 | This module prints out the my() variables used in a function or a | |
4375e838 | 379 | file. To get a list of the my() variables used in the subroutine |
54a137f5 GS |
380 | mysub() defined in the file myperlprogram: |
381 | ||
382 | $ perl -MO=Showlex,mysub myperlprogram | |
383 | ||
4375e838 | 384 | To get a list of the my() variables used in the file myperlprogram: |
54a137f5 GS |
385 | |
386 | $ perl -MO=Showlex myperlprogram | |
387 | ||
388 | [BROKEN] | |
389 | ||
390 | =item B::Stackobj | |
391 | ||
392 | This module is used by the B::CC module. It's not a back end itself, | |
393 | but rather a component of a back end. | |
394 | ||
395 | =item B::Stash | |
396 | ||
397 | This module is used by the L<perlcc> program, which compiles a module | |
398 | into an executable. B::Stash prints the symbol tables in use by a | |
399 | program, and is used to prevent B::CC from producing C code for the | |
400 | B::* and O modules. It's not a back end itself, but rather a | |
401 | component of a back end. | |
402 | ||
403 | =item B::Terse | |
404 | ||
405 | This module prints the contents of the parse tree, but without as much | |
406 | information as B::Debug. For comparison, C<print "Hello, world."> | |
407 | produced 96 lines of output from B::Debug, but only 6 from B::Terse. | |
408 | ||
409 | This module is useful for people who are writing their own back end, | |
410 | or who are learning about the Perl internals. It's not useful to the | |
411 | average programmer. | |
412 | ||
413 | =item B::Xref | |
414 | ||
415 | This module prints a report on where the variables, subroutines, and | |
416 | formats are defined and used within a program and the modules it | |
417 | loads. See L</"The Cross Referencing Back End"> for details about | |
418 | usage. | |
419 | ||
a45bd81d | 420 | =back |
54a137f5 GS |
421 | |
422 | =head1 KNOWN PROBLEMS | |
423 | ||
424 | The simple C backend currently only saves typeglobs with alphanumeric | |
425 | names. | |
426 | ||
427 | The optimized C backend outputs code for more modules than it should | |
428 | (e.g., DirHandle). It also has little hope of properly handling | |
4375e838 | 429 | C<goto LABEL> outside the running subroutine (C<goto &sub> is okay). |
54a137f5 GS |
430 | C<goto LABEL> currently does not work at all in this backend. |
431 | It also creates a huge initialization function that gives | |
432 | C compilers headaches. Splitting the initialization function gives | |
433 | better results. Other problems include: unsigned math does not | |
434 | work correctly; some opcodes are handled incorrectly by default | |
435 | opcode handling mechanism. | |
436 | ||
437 | BEGIN{} blocks are executed while compiling your code. Any external | |
438 | state that is initialized in BEGIN{}, such as opening files, initiating | |
439 | database connections etc., do not behave properly. To work around | |
440 | this, Perl has an INIT{} block that corresponds to code being executed | |
441 | before your program begins running but after your program has finished | |
442 | being compiled. Execution order: BEGIN{}, (possible save of state | |
443 | through compiler back-end), INIT{}, program runs, END{}. | |
444 | ||
445 | =head1 AUTHOR | |
446 | ||
447 | This document was originally written by Nathan Torkington, and is now | |
448 | maintained by the perl5-porters mailing list | |
449 | I<perl5-porters@perl.org>. | |
450 | ||
451 | =cut |