| 1 | =head1 NAME |
| 2 | |
| 3 | perlcompile - Introduction to the Perl Compiler-Translator |
| 4 | |
| 5 | =head1 DESCRIPTION |
| 6 | |
| 7 | Perl has always had a compiler: your source is compiled into an |
| 8 | internal form (a parse tree) which is then optimized before being |
| 9 | run. Since version 5.005, Perl has shipped with a module |
| 10 | capable of inspecting the optimized parse tree (C<B>), and this has |
| 11 | been used to write many useful utilities, including a module that lets |
| 12 | you turn your Perl into C source code that can be compiled into a |
| 13 | native executable. |
| 14 | |
| 15 | The C<B> module provides access to the parse tree, and other modules |
| 16 | ("back ends") do things with the tree. Some write it out as |
| 17 | semi-human-readable text. Another traverses the parse tree to build a |
| 18 | cross-reference of which subroutines, formats, and variables are used |
| 19 | where. Another checks your code for dubious constructs. Yet another back |
| 20 | end dumps the parse tree back out as Perl source, acting as a source code |
| 21 | beautifier or deobfuscator. |
| 22 | |
| 23 | Because its original purpose was to be a way to produce C code |
| 24 | corresponding to a Perl program, and in turn a native executable, the |
| 25 | C<B> module and its associated back ends are known as "the |
| 26 | compiler", even though they don't really compile anything. |
| 27 | Different parts of the compiler are more accurately a "translator", |
| 28 | or an "inspector", but people want Perl to have a "compiler |
| 29 | option" not an "inspector gadget". What can you do? |
| 30 | |
| 31 | This document covers the use of the Perl compiler: which modules |
| 32 | it comprises, how to use the most important of the back end modules, |
| 33 | what problems there are, and how to work around them. |
| 34 | |
| 35 | =head2 Layout |
| 36 | |
| 37 | The compiler back ends are in the C<B::> hierarchy, and the front-end |
| 38 | (the module that you, the user of the compiler, will sometimes |
| 39 | interact with) is the O module. |
| 40 | |
| 41 | Here are the important back ends to know about, with their status |
| 42 | expressed as a number from 0 (outline for later implementation) to |
| 43 | 10 (if there's a bug in it, we're very surprised): |
| 44 | |
| 45 | =over 4 |
| 46 | |
| 47 | =item B::Lint |
| 48 | |
| 49 | Complains if it finds dubious constructs in your source code. Status: |
| 50 | 6 (it works adequately, but only has a very limited number of areas |
| 51 | that it checks). |
| 52 | |
| 53 | =item B::Deparse |
| 54 | |
| 55 | Recreates the Perl source, making an attempt to format it coherently. |
| 56 | Status: 8 (it works nicely, but a few obscure things are missing). |
| 57 | |
| 58 | =item B::Xref |
| 59 | |
| 60 | Reports on the declaration and use of subroutines and variables. |
| 61 | Status: 8 (it works nicely, but still has a few lingering bugs). |
| 62 | |
| 63 | =back |
| 64 | |
| 65 | =head1 Using The Back Ends |
| 66 | |
| 67 | The following sections describe how to use the various compiler back |
| 68 | ends. They're presented roughly in order of maturity, so that the |
| 69 | most stable and proven back ends are described first, and the most |
| 70 | experimental and incomplete back ends are described last. |
| 71 | |
| 72 | The O module automatically enabled the B<-c> flag to Perl, which |
| 73 | prevents Perl from executing your code once it has been compiled. |
| 74 | This is why all the back ends print: |
| 75 | |
| 76 | myperlprogram syntax OK |
| 77 | |
| 78 | before producing any other output. |
| 79 | |
| 80 | =head2 The Cross Referencing Back End |
| 81 | |
| 82 | The cross referencing back end (B::Xref) produces a report on your program, |
| 83 | breaking down declarations and uses of subroutines and variables (and |
| 84 | formats) by file and subroutine. For instance, here's part of the |
| 85 | report from the I<pod2man> program that comes with Perl: |
| 86 | |
| 87 | Subroutine clear_noremap |
| 88 | Package (lexical) |
| 89 | $ready_to_print i1069, 1079 |
| 90 | Package main |
| 91 | $& 1086 |
| 92 | $. 1086 |
| 93 | $0 1086 |
| 94 | $1 1087 |
| 95 | $2 1085, 1085 |
| 96 | $3 1085, 1085 |
| 97 | $ARGV 1086 |
| 98 | %HTML_Escapes 1085, 1085 |
| 99 | |
| 100 | This shows the variables used in the subroutine C<clear_noremap>. The |
| 101 | variable C<$ready_to_print> is a my() (lexical) variable, |
| 102 | B<i>ntroduced (first declared with my()) on line 1069, and used on |
| 103 | line 1079. The variable C<$&> from the main package is used on 1086, |
| 104 | and so on. |
| 105 | |
| 106 | A line number may be prefixed by a single letter: |
| 107 | |
| 108 | =over 4 |
| 109 | |
| 110 | =item i |
| 111 | |
| 112 | Lexical variable introduced (declared with my()) for the first time. |
| 113 | |
| 114 | =item & |
| 115 | |
| 116 | Subroutine or method call. |
| 117 | |
| 118 | =item s |
| 119 | |
| 120 | Subroutine defined. |
| 121 | |
| 122 | =item r |
| 123 | |
| 124 | Format defined. |
| 125 | |
| 126 | =back |
| 127 | |
| 128 | The most useful option the cross referencer has is to save the report |
| 129 | to a separate file. For instance, to save the report on |
| 130 | I<myperlprogram> to the file I<report>: |
| 131 | |
| 132 | $ perl -MO=Xref,-oreport myperlprogram |
| 133 | |
| 134 | =head2 The Decompiling Back End |
| 135 | |
| 136 | The Deparse back end turns your Perl source back into Perl source. It |
| 137 | can reformat along the way, making it useful as a deobfuscator. The |
| 138 | most basic way to use it is: |
| 139 | |
| 140 | $ perl -MO=Deparse myperlprogram |
| 141 | |
| 142 | You'll notice immediately that Perl has no idea of how to paragraph |
| 143 | your code. You'll have to separate chunks of code from each other |
| 144 | with newlines by hand. However, watch what it will do with |
| 145 | one-liners: |
| 146 | |
| 147 | $ perl -MO=Deparse -e '$op=shift||die "usage: $0 |
| 148 | code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op; |
| 149 | die$@ if$@; rename$was,$_ unless$was eq $_}' |
| 150 | -e syntax OK |
| 151 | $op = shift @ARGV || die("usage: $0 code [...]"); |
| 152 | chomp(@ARGV = <ARGV>) unless @ARGV; |
| 153 | foreach $_ (@ARGV) { |
| 154 | $was = $_; |
| 155 | eval $op; |
| 156 | die $@ if $@; |
| 157 | rename $was, $_ unless $was eq $_; |
| 158 | } |
| 159 | |
| 160 | The decompiler has several options for the code it generates. For |
| 161 | instance, you can set the size of each indent from 4 (as above) to |
| 162 | 2 with: |
| 163 | |
| 164 | $ perl -MO=Deparse,-si2 myperlprogram |
| 165 | |
| 166 | The B<-p> option adds parentheses where normally they are omitted: |
| 167 | |
| 168 | $ perl -MO=Deparse -e 'print "Hello, world\n"' |
| 169 | -e syntax OK |
| 170 | print "Hello, world\n"; |
| 171 | $ perl -MO=Deparse,-p -e 'print "Hello, world\n"' |
| 172 | -e syntax OK |
| 173 | print("Hello, world\n"); |
| 174 | |
| 175 | See L<B::Deparse> for more information on the formatting options. |
| 176 | |
| 177 | =head2 The Lint Back End |
| 178 | |
| 179 | The lint back end (B::Lint) inspects programs for poor style. One |
| 180 | programmer's bad style is another programmer's useful tool, so options |
| 181 | let you select what is complained about. |
| 182 | |
| 183 | To run the style checker across your source code: |
| 184 | |
| 185 | $ perl -MO=Lint myperlprogram |
| 186 | |
| 187 | To disable context checks and undefined subroutines: |
| 188 | |
| 189 | $ perl -MO=Lint,-context,-undefined-subs myperlprogram |
| 190 | |
| 191 | See L<B::Lint> for information on the options. |
| 192 | |
| 193 | =head1 Module List for the Compiler Suite |
| 194 | |
| 195 | =over 4 |
| 196 | |
| 197 | =item B |
| 198 | |
| 199 | This module is the introspective ("reflective" in Java terms) |
| 200 | module, which allows a Perl program to inspect its innards. The |
| 201 | back end modules all use this module to gain access to the compiled |
| 202 | parse tree. You, the user of a back end module, will not need to |
| 203 | interact with B. |
| 204 | |
| 205 | =item O |
| 206 | |
| 207 | This module is the front-end to the compiler's back ends. Normally |
| 208 | called something like this: |
| 209 | |
| 210 | $ perl -MO=Deparse myperlprogram |
| 211 | |
| 212 | This is like saying C<use O 'Deparse'> in your Perl program. |
| 213 | |
| 214 | =item B::Concise |
| 215 | |
| 216 | This module prints a concise (but complete) version of the Perl parse |
| 217 | tree. Its output is more customizable than the one of B::Terse or |
| 218 | B::Debug (and it can emulate them). This module useful for people who |
| 219 | are writing their own back end, or who are learning about the Perl |
| 220 | internals. It's not useful to the average programmer. |
| 221 | |
| 222 | =item B::Debug |
| 223 | |
| 224 | This module dumps the Perl parse tree in verbose detail to STDOUT. |
| 225 | It's useful for people who are writing their own back end, or who |
| 226 | are learning about the Perl internals. It's not useful to the |
| 227 | average programmer. |
| 228 | |
| 229 | =item B::Deparse |
| 230 | |
| 231 | This module produces Perl source code from the compiled parse tree. |
| 232 | It is useful in debugging and deconstructing other people's code, |
| 233 | also as a pretty-printer for your own source. See |
| 234 | L</"The Decompiling Back End"> for details about usage. |
| 235 | |
| 236 | =item B::Lint |
| 237 | |
| 238 | This module inspects the compiled form of your source code for things |
| 239 | which, while some people frown on them, aren't necessarily bad enough |
| 240 | to justify a warning. For instance, use of an array in scalar context |
| 241 | without explicitly saying C<scalar(@array)> is something that Lint |
| 242 | can identify. See L</"The Lint Back End"> for details about usage. |
| 243 | |
| 244 | =item B::Showlex |
| 245 | |
| 246 | This module prints out the my() variables used in a function or a |
| 247 | file. To get a list of the my() variables used in the subroutine |
| 248 | mysub() defined in the file myperlprogram: |
| 249 | |
| 250 | $ perl -MO=Showlex,mysub myperlprogram |
| 251 | |
| 252 | To get a list of the my() variables used in the file myperlprogram: |
| 253 | |
| 254 | $ perl -MO=Showlex myperlprogram |
| 255 | |
| 256 | [BROKEN] |
| 257 | |
| 258 | =item B::Terse |
| 259 | |
| 260 | This module prints the contents of the parse tree, but without as much |
| 261 | information as B::Debug. For comparison, C<print "Hello, world."> |
| 262 | produced 96 lines of output from B::Debug, but only 6 from B::Terse. |
| 263 | |
| 264 | This module is useful for people who are writing their own back end, |
| 265 | or who are learning about the Perl internals. It's not useful to the |
| 266 | average programmer. |
| 267 | |
| 268 | =item B::Xref |
| 269 | |
| 270 | This module prints a report on where the variables, subroutines, and |
| 271 | formats are defined and used within a program and the modules it |
| 272 | loads. See L</"The Cross Referencing Back End"> for details about |
| 273 | usage. |
| 274 | |
| 275 | =back |
| 276 | |
| 277 | =head1 KNOWN PROBLEMS |
| 278 | |
| 279 | BEGIN{} blocks are executed while compiling your code. Any external |
| 280 | state that is initialized in BEGIN{}, such as opening files, initiating |
| 281 | database connections etc., do not behave properly. To work around |
| 282 | this, Perl has an INIT{} block that corresponds to code being executed |
| 283 | before your program begins running but after your program has finished |
| 284 | being compiled. Execution order: BEGIN{}, (possible save of state |
| 285 | through compiler back-end), INIT{}, program runs, END{}. |
| 286 | |
| 287 | =head1 AUTHOR |
| 288 | |
| 289 | This document was originally written by Nathan Torkington, and is now |
| 290 | maintained by the perl5-porters mailing list |
| 291 | I<perl5-porters@perl.org>. |
| 292 | |
| 293 | =cut |