Commit | Line | Data |
---|---|---|
54a137f5 GS |
1 | =head1 NAME |
2 | ||
3 | perlcompile - Introduction to the Perl Compiler-Translator | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | Perl has always had a compiler: your source is compiled into an | |
8 | internal form (a parse tree) which is then optimized before being | |
9 | run. Since version 5.005, Perl has shipped with a module | |
10 | capable of inspecting the optimized parse tree (C<B>), and this has | |
11 | been used to write many useful utilities, including a module that lets | |
d1be9408 | 12 | you turn your Perl into C source code that can be compiled into a |
54a137f5 GS |
13 | native executable. |
14 | ||
15 | The C<B> module provides access to the parse tree, and other modules | |
16 | ("back ends") do things with the tree. Some write it out as | |
de125441 RGS |
17 | semi-human-readable text. Another traverses the parse tree to build a |
18 | cross-reference of which subroutines, formats, and variables are used | |
19 | where. Another checks your code for dubious constructs. Yet another back | |
20 | end dumps the parse tree back out as Perl source, acting as a source code | |
21 | beautifier or deobfuscator. | |
54a137f5 GS |
22 | |
23 | Because its original purpose was to be a way to produce C code | |
24 | corresponding to a Perl program, and in turn a native executable, the | |
25 | C<B> module and its associated back ends are known as "the | |
26 | compiler", even though they don't really compile anything. | |
27 | Different parts of the compiler are more accurately a "translator", | |
28 | or an "inspector", but people want Perl to have a "compiler | |
29 | option" not an "inspector gadget". What can you do? | |
30 | ||
31 | This document covers the use of the Perl compiler: which modules | |
32 | it comprises, how to use the most important of the back end modules, | |
33 | what problems there are, and how to work around them. | |
34 | ||
35 | =head2 Layout | |
36 | ||
37 | The compiler back ends are in the C<B::> hierarchy, and the front-end | |
38 | (the module that you, the user of the compiler, will sometimes | |
de125441 | 39 | interact with) is the O module. |
54a137f5 GS |
40 | |
41 | Here are the important back ends to know about, with their status | |
42 | expressed as a number from 0 (outline for later implementation) to | |
43 | 10 (if there's a bug in it, we're very surprised): | |
44 | ||
45 | =over 4 | |
46 | ||
54a137f5 GS |
47 | =item B::Lint |
48 | ||
49 | Complains if it finds dubious constructs in your source code. Status: | |
50 | 6 (it works adequately, but only has a very limited number of areas | |
51 | that it checks). | |
52 | ||
53 | =item B::Deparse | |
54 | ||
55 | Recreates the Perl source, making an attempt to format it coherently. | |
56 | Status: 8 (it works nicely, but a few obscure things are missing). | |
57 | ||
58 | =item B::Xref | |
59 | ||
60 | Reports on the declaration and use of subroutines and variables. | |
61 | Status: 8 (it works nicely, but still has a few lingering bugs). | |
62 | ||
63 | =back | |
64 | ||
65 | =head1 Using The Back Ends | |
66 | ||
67 | The following sections describe how to use the various compiler back | |
68 | ends. They're presented roughly in order of maturity, so that the | |
69 | most stable and proven back ends are described first, and the most | |
70 | experimental and incomplete back ends are described last. | |
71 | ||
72 | The O module automatically enabled the B<-c> flag to Perl, which | |
73 | prevents Perl from executing your code once it has been compiled. | |
74 | This is why all the back ends print: | |
75 | ||
76 | myperlprogram syntax OK | |
77 | ||
78 | before producing any other output. | |
79 | ||
4a4eefd0 | 80 | =head2 The Cross Referencing Back End |
54a137f5 | 81 | |
4a4eefd0 | 82 | The cross referencing back end (B::Xref) produces a report on your program, |
54a137f5 GS |
83 | breaking down declarations and uses of subroutines and variables (and |
84 | formats) by file and subroutine. For instance, here's part of the | |
85 | report from the I<pod2man> program that comes with Perl: | |
86 | ||
87 | Subroutine clear_noremap | |
88 | Package (lexical) | |
89 | $ready_to_print i1069, 1079 | |
90 | Package main | |
91 | $& 1086 | |
92 | $. 1086 | |
93 | $0 1086 | |
94 | $1 1087 | |
95 | $2 1085, 1085 | |
96 | $3 1085, 1085 | |
97 | $ARGV 1086 | |
98 | %HTML_Escapes 1085, 1085 | |
99 | ||
100 | This shows the variables used in the subroutine C<clear_noremap>. The | |
101 | variable C<$ready_to_print> is a my() (lexical) variable, | |
102 | B<i>ntroduced (first declared with my()) on line 1069, and used on | |
103 | line 1079. The variable C<$&> from the main package is used on 1086, | |
104 | and so on. | |
105 | ||
106 | A line number may be prefixed by a single letter: | |
107 | ||
108 | =over 4 | |
109 | ||
110 | =item i | |
111 | ||
112 | Lexical variable introduced (declared with my()) for the first time. | |
113 | ||
114 | =item & | |
115 | ||
116 | Subroutine or method call. | |
117 | ||
118 | =item s | |
119 | ||
120 | Subroutine defined. | |
121 | ||
122 | =item r | |
123 | ||
124 | Format defined. | |
125 | ||
126 | =back | |
127 | ||
128 | The most useful option the cross referencer has is to save the report | |
129 | to a separate file. For instance, to save the report on | |
130 | I<myperlprogram> to the file I<report>: | |
131 | ||
132 | $ perl -MO=Xref,-oreport myperlprogram | |
133 | ||
134 | =head2 The Decompiling Back End | |
135 | ||
136 | The Deparse back end turns your Perl source back into Perl source. It | |
c1e31494 | 137 | can reformat along the way, making it useful as a deobfuscator. The |
54a137f5 GS |
138 | most basic way to use it is: |
139 | ||
140 | $ perl -MO=Deparse myperlprogram | |
141 | ||
142 | You'll notice immediately that Perl has no idea of how to paragraph | |
143 | your code. You'll have to separate chunks of code from each other | |
144 | with newlines by hand. However, watch what it will do with | |
145 | one-liners: | |
146 | ||
147 | $ perl -MO=Deparse -e '$op=shift||die "usage: $0 | |
148 | code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op; | |
149 | die$@ if$@; rename$was,$_ unless$was eq $_}' | |
150 | -e syntax OK | |
151 | $op = shift @ARGV || die("usage: $0 code [...]"); | |
152 | chomp(@ARGV = <ARGV>) unless @ARGV; | |
153 | foreach $_ (@ARGV) { | |
154 | $was = $_; | |
155 | eval $op; | |
156 | die $@ if $@; | |
157 | rename $was, $_ unless $was eq $_; | |
158 | } | |
159 | ||
54a137f5 GS |
160 | The decompiler has several options for the code it generates. For |
161 | instance, you can set the size of each indent from 4 (as above) to | |
162 | 2 with: | |
163 | ||
164 | $ perl -MO=Deparse,-si2 myperlprogram | |
165 | ||
166 | The B<-p> option adds parentheses where normally they are omitted: | |
167 | ||
168 | $ perl -MO=Deparse -e 'print "Hello, world\n"' | |
169 | -e syntax OK | |
170 | print "Hello, world\n"; | |
171 | $ perl -MO=Deparse,-p -e 'print "Hello, world\n"' | |
172 | -e syntax OK | |
173 | print("Hello, world\n"); | |
174 | ||
175 | See L<B::Deparse> for more information on the formatting options. | |
176 | ||
4a4eefd0 | 177 | =head2 The Lint Back End |
54a137f5 | 178 | |
4a4eefd0 GS |
179 | The lint back end (B::Lint) inspects programs for poor style. One |
180 | programmer's bad style is another programmer's useful tool, so options | |
181 | let you select what is complained about. | |
54a137f5 GS |
182 | |
183 | To run the style checker across your source code: | |
184 | ||
185 | $ perl -MO=Lint myperlprogram | |
186 | ||
187 | To disable context checks and undefined subroutines: | |
188 | ||
189 | $ perl -MO=Lint,-context,-undefined-subs myperlprogram | |
190 | ||
191 | See L<B::Lint> for information on the options. | |
192 | ||
384e87d1 RGS |
193 | =head1 Module List for the Compiler Suite |
194 | ||
54a137f5 GS |
195 | =over 4 |
196 | ||
197 | =item B | |
198 | ||
199 | This module is the introspective ("reflective" in Java terms) | |
200 | module, which allows a Perl program to inspect its innards. The | |
201 | back end modules all use this module to gain access to the compiled | |
202 | parse tree. You, the user of a back end module, will not need to | |
203 | interact with B. | |
204 | ||
205 | =item O | |
206 | ||
207 | This module is the front-end to the compiler's back ends. Normally | |
208 | called something like this: | |
209 | ||
210 | $ perl -MO=Deparse myperlprogram | |
211 | ||
212 | This is like saying C<use O 'Deparse'> in your Perl program. | |
213 | ||
384e87d1 RGS |
214 | =item B::Concise |
215 | ||
216 | This module prints a concise (but complete) version of the Perl parse | |
217 | tree. Its output is more customizable than the one of B::Terse or | |
218 | B::Debug (and it can emulate them). This module useful for people who | |
219 | are writing their own back end, or who are learning about the Perl | |
220 | internals. It's not useful to the average programmer. | |
221 | ||
54a137f5 GS |
222 | =item B::Debug |
223 | ||
224 | This module dumps the Perl parse tree in verbose detail to STDOUT. | |
225 | It's useful for people who are writing their own back end, or who | |
226 | are learning about the Perl internals. It's not useful to the | |
227 | average programmer. | |
228 | ||
229 | =item B::Deparse | |
230 | ||
231 | This module produces Perl source code from the compiled parse tree. | |
232 | It is useful in debugging and deconstructing other people's code, | |
233 | also as a pretty-printer for your own source. See | |
234 | L</"The Decompiling Back End"> for details about usage. | |
235 | ||
54a137f5 GS |
236 | =item B::Lint |
237 | ||
238 | This module inspects the compiled form of your source code for things | |
239 | which, while some people frown on them, aren't necessarily bad enough | |
240 | to justify a warning. For instance, use of an array in scalar context | |
241 | without explicitly saying C<scalar(@array)> is something that Lint | |
242 | can identify. See L</"The Lint Back End"> for details about usage. | |
243 | ||
244 | =item B::Showlex | |
245 | ||
246 | This module prints out the my() variables used in a function or a | |
4375e838 | 247 | file. To get a list of the my() variables used in the subroutine |
54a137f5 GS |
248 | mysub() defined in the file myperlprogram: |
249 | ||
250 | $ perl -MO=Showlex,mysub myperlprogram | |
251 | ||
4375e838 | 252 | To get a list of the my() variables used in the file myperlprogram: |
54a137f5 GS |
253 | |
254 | $ perl -MO=Showlex myperlprogram | |
255 | ||
256 | [BROKEN] | |
257 | ||
54a137f5 GS |
258 | =item B::Terse |
259 | ||
260 | This module prints the contents of the parse tree, but without as much | |
261 | information as B::Debug. For comparison, C<print "Hello, world."> | |
262 | produced 96 lines of output from B::Debug, but only 6 from B::Terse. | |
263 | ||
264 | This module is useful for people who are writing their own back end, | |
265 | or who are learning about the Perl internals. It's not useful to the | |
266 | average programmer. | |
267 | ||
268 | =item B::Xref | |
269 | ||
270 | This module prints a report on where the variables, subroutines, and | |
271 | formats are defined and used within a program and the modules it | |
272 | loads. See L</"The Cross Referencing Back End"> for details about | |
273 | usage. | |
274 | ||
a45bd81d | 275 | =back |
54a137f5 GS |
276 | |
277 | =head1 KNOWN PROBLEMS | |
278 | ||
54a137f5 GS |
279 | BEGIN{} blocks are executed while compiling your code. Any external |
280 | state that is initialized in BEGIN{}, such as opening files, initiating | |
281 | database connections etc., do not behave properly. To work around | |
282 | this, Perl has an INIT{} block that corresponds to code being executed | |
283 | before your program begins running but after your program has finished | |
284 | being compiled. Execution order: BEGIN{}, (possible save of state | |
285 | through compiler back-end), INIT{}, program runs, END{}. | |
286 | ||
287 | =head1 AUTHOR | |
288 | ||
289 | This document was originally written by Nathan Torkington, and is now | |
290 | maintained by the perl5-porters mailing list | |
291 | I<perl5-porters@perl.org>. | |
292 | ||
293 | =cut |