Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
f102b883 | 3 | perlmod - Perl modules (packages and symbol tables) |
a0d0e21e LW |
4 | |
5 | =head1 DESCRIPTION | |
6 | ||
7 | =head2 Packages | |
8 | ||
748a9306 | 9 | Perl provides a mechanism for alternative namespaces to protect packages |
5a964f20 TC |
10 | from stomping on each other's variables. In fact, there's really no such |
11 | thing as a global variable in Perl (although some identifiers default | |
12 | to the main package instead of the current one). The package statement | |
13 | declares the compilation unit as | |
f102b883 TC |
14 | being in the given namespace. The scope of the package declaration |
15 | is from the declaration itself through the end of the enclosing block, | |
16 | C<eval>, C<sub>, or end of file, whichever comes first (the same scope | |
17 | as the my() and local() operators). All further unqualified dynamic | |
5a964f20 TC |
18 | identifiers will be in this namespace. A package statement only affects |
19 | dynamic variables--including those you've used local() on--but | |
f102b883 TC |
20 | I<not> lexical variables created with my(). Typically it would be |
21 | the first declaration in a file to be included by the C<require> or | |
22 | C<use> operator. You can switch into a package in more than one place; | |
5a964f20 | 23 | it merely influences which symbol table is used by the compiler for the |
f102b883 TC |
24 | rest of that block. You can refer to variables and filehandles in other |
25 | packages by prefixing the identifier with the package name and a double | |
26 | colon: C<$Package::Variable>. If the package name is null, the C<main> | |
27 | package is assumed. That is, C<$::sail> is equivalent to C<$main::sail>. | |
a0d0e21e | 28 | |
d3ebb66b GS |
29 | The old package delimiter was a single quote, but double colon is now the |
30 | preferred delimiter, in part because it's more readable to humans, and | |
31 | in part because it's more readable to B<emacs> macros. It also makes C++ | |
32 | programmers feel like they know what's going on--as opposed to using the | |
33 | single quote as separator, which was there to make Ada programmers feel | |
34 | like they knew what's going on. Because the old-fashioned syntax is still | |
35 | supported for backwards compatibility, if you try to use a string like | |
36 | C<"This is $owner's house">, you'll be accessing C<$owner::s>; that is, | |
37 | the $s variable in package C<owner>, which is probably not what you meant. | |
38 | Use braces to disambiguate, as in C<"This is ${owner}'s house">. | |
a0d0e21e LW |
39 | |
40 | Packages may be nested inside other packages: C<$OUTER::INNER::var>. This | |
41 | implies nothing about the order of name lookups, however. All symbols | |
42 | are either local to the current package, or must be fully qualified | |
43 | from the outer package name down. For instance, there is nowhere | |
44 | within package C<OUTER> that C<$INNER::var> refers to C<$OUTER::INNER::var>. | |
45 | It would treat package C<INNER> as a totally separate global package. | |
46 | ||
47 | Only identifiers starting with letters (or underscore) are stored in a | |
cb1a09d0 | 48 | package's symbol table. All other symbols are kept in package C<main>, |
5a964f20 TC |
49 | including all of the punctuation variables like $_. In addition, when |
50 | unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, | |
51 | INC, and SIG are forced to be in package C<main>, even when used for other | |
52 | purposes than their builtin one. Note also that, if you have a package | |
53 | called C<m>, C<s>, or C<y>, then you can't use the qualified form of an | |
54 | identifier because it will be interpreted instead as a pattern match, | |
55 | a substitution, or a transliteration. | |
a0d0e21e LW |
56 | |
57 | (Variables beginning with underscore used to be forced into package | |
58 | main, but we decided it was more useful for package writers to be able | |
cb1a09d0 AD |
59 | to use leading underscore to indicate private variables and method names. |
60 | $_ is still global though.) | |
a0d0e21e LW |
61 | |
62 | Eval()ed strings are compiled in the package in which the eval() was | |
63 | compiled. (Assignments to C<$SIG{}>, however, assume the signal | |
748a9306 | 64 | handler specified is in the C<main> package. Qualify the signal handler |
a0d0e21e LW |
65 | name if you wish to have a signal handler in a package.) For an |
66 | example, examine F<perldb.pl> in the Perl library. It initially switches | |
67 | to the C<DB> package so that the debugger doesn't interfere with variables | |
68 | in the script you are trying to debug. At various points, however, it | |
69 | temporarily switches back to the C<main> package to evaluate various | |
70 | expressions in the context of the C<main> package (or wherever you came | |
71 | from). See L<perldebug>. | |
72 | ||
f102b883 TC |
73 | The special symbol C<__PACKAGE__> contains the current package, but cannot |
74 | (easily) be used to construct variables. | |
75 | ||
5f05dabc | 76 | See L<perlsub> for other scoping issues related to my() and local(), |
f102b883 | 77 | and L<perlref> regarding closures. |
cb1a09d0 | 78 | |
a0d0e21e LW |
79 | =head2 Symbol Tables |
80 | ||
aa689395 | 81 | The symbol table for a package happens to be stored in the hash of that |
82 | name with two colons appended. The main symbol table's name is thus | |
83 | C<%main::>, or C<%::> for short. Likewise symbol table for the nested | |
84 | package mentioned earlier is named C<%OUTER::INNER::>. | |
85 | ||
86 | The value in each entry of the hash is what you are referring to when you | |
87 | use the C<*name> typeglob notation. In fact, the following have the same | |
88 | effect, though the first is more efficient because it does the symbol | |
89 | table lookups at compile time: | |
a0d0e21e | 90 | |
f102b883 TC |
91 | local *main::foo = *main::bar; |
92 | local $main::{foo} = $main::{bar}; | |
a0d0e21e LW |
93 | |
94 | You can use this to print out all the variables in a package, for | |
5a964f20 TC |
95 | instance. The standard F<dumpvar.pl> library and the CPAN module |
96 | Devel::Symdump make use of this. | |
a0d0e21e | 97 | |
cb1a09d0 | 98 | Assignment to a typeglob performs an aliasing operation, i.e., |
a0d0e21e LW |
99 | |
100 | *dick = *richard; | |
101 | ||
5a964f20 TC |
102 | causes variables, subroutines, formats, and file and directory handles |
103 | accessible via the identifier C<richard> also to be accessible via the | |
104 | identifier C<dick>. If you want to alias only a particular variable or | |
105 | subroutine, you can assign a reference instead: | |
a0d0e21e LW |
106 | |
107 | *dick = \$richard; | |
108 | ||
5a964f20 | 109 | Which makes $richard and $dick the same variable, but leaves |
a0d0e21e LW |
110 | @richard and @dick as separate arrays. Tricky, eh? |
111 | ||
cb1a09d0 AD |
112 | This mechanism may be used to pass and return cheap references |
113 | into or from subroutines if you won't want to copy the whole | |
5a964f20 TC |
114 | thing. It only works when assigning to dynamic variables, not |
115 | lexicals. | |
cb1a09d0 | 116 | |
5a964f20 | 117 | %some_hash = (); # can't be my() |
cb1a09d0 AD |
118 | *some_hash = fn( \%another_hash ); |
119 | sub fn { | |
120 | local *hashsym = shift; | |
121 | # now use %hashsym normally, and you | |
122 | # will affect the caller's %another_hash | |
123 | my %nhash = (); # do what you want | |
5f05dabc | 124 | return \%nhash; |
cb1a09d0 AD |
125 | } |
126 | ||
5f05dabc | 127 | On return, the reference will overwrite the hash slot in the |
cb1a09d0 | 128 | symbol table specified by the *some_hash typeglob. This |
c36e9b62 | 129 | is a somewhat tricky way of passing around references cheaply |
cb1a09d0 AD |
130 | when you won't want to have to remember to dereference variables |
131 | explicitly. | |
132 | ||
133 | Another use of symbol tables is for making "constant" scalars. | |
134 | ||
135 | *PI = \3.14159265358979; | |
136 | ||
137 | Now you cannot alter $PI, which is probably a good thing all in all. | |
5a964f20 TC |
138 | This isn't the same as a constant subroutine, which is subject to |
139 | optimization at compile-time. This isn't. A constant subroutine is one | |
140 | prototyped to take no arguments and to return a constant expression. | |
141 | See L<perlsub> for details on these. The C<use constant> pragma is a | |
142 | convenient shorthand for these. | |
cb1a09d0 | 143 | |
55497cff | 144 | You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and |
145 | package the *foo symbol table entry comes from. This may be useful | |
5a964f20 | 146 | in a subroutine that gets passed typeglobs as arguments: |
55497cff | 147 | |
148 | sub identify_typeglob { | |
149 | my $glob = shift; | |
150 | print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, "\n"; | |
151 | } | |
152 | identify_typeglob *foo; | |
153 | identify_typeglob *bar::baz; | |
154 | ||
155 | This prints | |
156 | ||
157 | You gave me main::foo | |
158 | You gave me bar::baz | |
159 | ||
160 | The *foo{THING} notation can also be used to obtain references to the | |
161 | individual elements of *foo, see L<perlref>. | |
162 | ||
a0d0e21e LW |
163 | =head2 Package Constructors and Destructors |
164 | ||
165 | There are two special subroutine definitions that function as package | |
166 | constructors and destructors. These are the C<BEGIN> and C<END> | |
167 | routines. The C<sub> is optional for these routines. | |
168 | ||
f102b883 TC |
169 | A C<BEGIN> subroutine is executed as soon as possible, that is, the moment |
170 | it is completely defined, even before the rest of the containing file | |
171 | is parsed. You may have multiple C<BEGIN> blocks within a file--they | |
172 | will execute in order of definition. Because a C<BEGIN> block executes | |
173 | immediately, it can pull in definitions of subroutines and such from other | |
174 | files in time to be visible to the rest of the file. Once a C<BEGIN> | |
175 | has run, it is immediately undefined and any code it used is returned to | |
176 | Perl's memory pool. This means you can't ever explicitly call a C<BEGIN>. | |
a0d0e21e | 177 | |
5a964f20 TC |
178 | An C<END> subroutine is executed as late as possible, that is, when |
179 | the interpreter is being exited, even if it is exiting as a result of | |
180 | a die() function. (But not if it's polymorphing into another program | |
181 | via C<exec>, or being blown out of the water by a signal--you have to | |
182 | trap that yourself (if you can).) You may have multiple C<END> blocks | |
183 | within a file--they will execute in reverse order of definition; that is: | |
184 | last in, first out (LIFO). | |
a0d0e21e | 185 | |
5a964f20 | 186 | Inside an C<END> subroutine, C<$?> contains the value that the script is |
c36e9b62 | 187 | going to pass to C<exit()>. You can modify C<$?> to change the exit |
f102b883 | 188 | value of the script. Beware of changing C<$?> by accident (e.g. by |
c36e9b62 | 189 | running something via C<system>). |
190 | ||
5a964f20 TC |
191 | Note that when you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and |
192 | C<END> work just as they do in B<awk>, as a degenerate case. As currently | |
193 | implemented (and subject to change, since its inconvenient at best), | |
194 | both C<BEGIN> I<and> C<END> blocks are run when you use the B<-c> switch | |
195 | for a compile-only syntax check, although your main code is not. | |
a0d0e21e LW |
196 | |
197 | =head2 Perl Classes | |
198 | ||
4633a7c4 | 199 | There is no special class syntax in Perl, but a package may function |
5a964f20 TC |
200 | as a class if it provides subroutines to act as methods. Such a |
201 | package may also derive some of its methods from another class (package) | |
202 | by listing the other package name in its global @ISA array (which | |
203 | must be a package global, not a lexical). | |
4633a7c4 | 204 | |
f102b883 | 205 | For more on this, see L<perltoot> and L<perlobj>. |
a0d0e21e LW |
206 | |
207 | =head2 Perl Modules | |
208 | ||
c07a80fd | 209 | A module is just a package that is defined in a library file of |
a0d0e21e LW |
210 | the same name, and is designed to be reusable. It may do this by |
211 | providing a mechanism for exporting some of its symbols into the symbol | |
212 | table of any package using it. Or it may function as a class | |
213 | definition and make its semantics available implicitly through method | |
214 | calls on the class and its objects, without explicit exportation of any | |
215 | symbols. Or it can do a little of both. | |
216 | ||
9607fc9c | 217 | For example, to start a normal module called Some::Module, create |
218 | a file called Some/Module.pm and start with this template: | |
219 | ||
220 | package Some::Module; # assumes Some/Module.pm | |
221 | ||
222 | use strict; | |
223 | ||
224 | BEGIN { | |
225 | use Exporter (); | |
226 | use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); | |
227 | ||
228 | # set the version for version checking | |
229 | $VERSION = 1.00; | |
230 | # if using RCS/CVS, this may be preferred | |
231 | $VERSION = do { my @r = (q$Revision: 2.21 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r }; # must be all one line, for MakeMaker | |
232 | ||
233 | @ISA = qw(Exporter); | |
234 | @EXPORT = qw(&func1 &func2 &func4); | |
235 | %EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ], | |
236 | ||
237 | # your exported package globals go here, | |
238 | # as well as any optionally exported functions | |
239 | @EXPORT_OK = qw($Var1 %Hashit &func3); | |
240 | } | |
241 | use vars @EXPORT_OK; | |
242 | ||
243 | # non-exported package globals go here | |
244 | use vars qw(@more $stuff); | |
245 | ||
c2611fb3 | 246 | # initialize package globals, first exported ones |
9607fc9c | 247 | $Var1 = ''; |
248 | %Hashit = (); | |
249 | ||
250 | # then the others (which are still accessible as $Some::Module::stuff) | |
251 | $stuff = ''; | |
252 | @more = (); | |
253 | ||
254 | # all file-scoped lexicals must be created before | |
255 | # the functions below that use them. | |
256 | ||
257 | # file-private lexicals go here | |
258 | my $priv_var = ''; | |
259 | my %secret_hash = (); | |
260 | ||
261 | # here's a file-private function as a closure, | |
262 | # callable as &$priv_func; it cannot be prototyped. | |
263 | my $priv_func = sub { | |
264 | # stuff goes here. | |
265 | }; | |
266 | ||
267 | # make all your functions, whether exported or not; | |
268 | # remember to put something interesting in the {} stubs | |
269 | sub func1 {} # no prototype | |
270 | sub func2() {} # proto'd void | |
271 | sub func3($$) {} # proto'd to 2 scalars | |
272 | ||
273 | # this one isn't exported, but could be called! | |
274 | sub func4(\%) {} # proto'd to 1 hash ref | |
275 | ||
276 | END { } # module clean-up code here (global destructor) | |
4633a7c4 LW |
277 | |
278 | Then go on to declare and use your variables in functions | |
279 | without any qualifications. | |
f102b883 | 280 | See L<Exporter> and the L<perlmodlib> for details on |
4633a7c4 LW |
281 | mechanics and style issues in module creation. |
282 | ||
283 | Perl modules are included into your program by saying | |
a0d0e21e LW |
284 | |
285 | use Module; | |
286 | ||
287 | or | |
288 | ||
289 | use Module LIST; | |
290 | ||
291 | This is exactly equivalent to | |
292 | ||
5a964f20 | 293 | BEGIN { require Module; import Module; } |
a0d0e21e LW |
294 | |
295 | or | |
296 | ||
5a964f20 | 297 | BEGIN { require Module; import Module LIST; } |
a0d0e21e | 298 | |
cb1a09d0 AD |
299 | As a special case |
300 | ||
301 | use Module (); | |
302 | ||
303 | is exactly equivalent to | |
304 | ||
5a964f20 | 305 | BEGIN { require Module; } |
cb1a09d0 | 306 | |
a0d0e21e LW |
307 | All Perl module files have the extension F<.pm>. C<use> assumes this so |
308 | that you don't have to spell out "F<Module.pm>" in quotes. This also | |
309 | helps to differentiate new modules from old F<.pl> and F<.ph> files. | |
310 | Module names are also capitalized unless they're functioning as pragmas, | |
311 | "Pragmas" are in effect compiler directives, and are sometimes called | |
312 | "pragmatic modules" (or even "pragmata" if you're a classicist). | |
313 | ||
5a964f20 TC |
314 | The two statements: |
315 | ||
316 | require SomeModule; | |
317 | require "SomeModule.pm"; | |
318 | ||
319 | differ from each other in two ways. In the first case, any double | |
320 | colons in the module name, such as C<Some::Module>, are translated | |
321 | into your system's directory separator, usually "/". The second | |
322 | case does not, and would have to be specified literally. The other difference | |
323 | is that seeing the first C<require> clues in the compiler that uses of | |
324 | indirect object notation involving "SomeModule", as in C<$ob = purge SomeModule>, | |
325 | are method calls, not function calls. (Yes, this really can make a difference.) | |
326 | ||
a0d0e21e LW |
327 | Because the C<use> statement implies a C<BEGIN> block, the importation |
328 | of semantics happens at the moment the C<use> statement is compiled, | |
329 | before the rest of the file is compiled. This is how it is able | |
330 | to function as a pragma mechanism, and also how modules are able to | |
331 | declare subroutines that are then visible as list operators for | |
332 | the rest of the current file. This will not work if you use C<require> | |
cb1a09d0 | 333 | instead of C<use>. With require you can get into this problem: |
a0d0e21e LW |
334 | |
335 | require Cwd; # make Cwd:: accessible | |
54310121 | 336 | $here = Cwd::getcwd(); |
a0d0e21e | 337 | |
5f05dabc | 338 | use Cwd; # import names from Cwd:: |
a0d0e21e LW |
339 | $here = getcwd(); |
340 | ||
341 | require Cwd; # make Cwd:: accessible | |
342 | $here = getcwd(); # oops! no main::getcwd() | |
343 | ||
5a964f20 TC |
344 | In general, C<use Module ()> is recommended over C<require Module>, |
345 | because it determines module availability at compile time, not in the | |
346 | middle of your program's execution. An exception would be if two modules | |
347 | each tried to C<use> each other, and each also called a function from | |
348 | that other module. In that case, it's easy to use C<require>s instead. | |
cb1a09d0 | 349 | |
a0d0e21e LW |
350 | Perl packages may be nested inside other package names, so we can have |
351 | package names containing C<::>. But if we used that package name | |
352 | directly as a filename it would makes for unwieldy or impossible | |
353 | filenames on some systems. Therefore, if a module's name is, say, | |
354 | C<Text::Soundex>, then its definition is actually found in the library | |
355 | file F<Text/Soundex.pm>. | |
356 | ||
357 | Perl modules always have a F<.pm> file, but there may also be dynamically | |
358 | linked executables or autoloaded subroutine definitions associated with | |
359 | the module. If so, these will be entirely transparent to the user of | |
360 | the module. It is the responsibility of the F<.pm> file to load (or | |
361 | arrange to autoload) any additional functionality. The POSIX module | |
362 | happens to do both dynamic loading and autoloading, but the user can | |
5f05dabc | 363 | say just C<use POSIX> to get it all. |
a0d0e21e | 364 | |
f102b883 | 365 | For more information on writing extension modules, see L<perlxstut> |
a0d0e21e LW |
366 | and L<perlguts>. |
367 | ||
f102b883 | 368 | =head1 SEE ALSO |
cb1a09d0 | 369 | |
f102b883 TC |
370 | See L<perlmodlib> for general style issues related to building Perl |
371 | modules and classes as well as descriptions of the standard library and | |
372 | CPAN, L<Exporter> for how Perl's standard import/export mechanism works, | |
373 | L<perltoot> for an in-depth tutorial on creating classes, L<perlobj> | |
374 | for a hard-core reference document on objects, and L<perlsub> for an | |
375 | explanation of functions and scoping. |