'perlfaq' =>
{
'MAINTAINER' => 'perlfaq',
- 'DISTRIBUTION' => 'LLAP/perlfaq-5.01500302.tar.gz',
+ 'DISTRIBUTION' => 'LLAP/perlfaq-5.0150033.tar.gz',
'FILES' => q[cpan/perlfaq],
'EXCLUDED' => [ qw(t/release-pod-syntax.t) ],
'UPSTREAM' => 'cpan',
'Math::BigInt' =>
{
'MAINTAINER' => 'rafl',
- 'DISTRIBUTION' => 'PJACKLAM/Math-BigInt-1.993.tar.gz',
+ 'DISTRIBUTION' => 'PJACKLAM/Math-BigInt-1.997.tar.gz',
'FILES' => q[dist/Math-BigInt],
'EXCLUDED' => [ qr{^inc/},
qr{^examples/},
'Math::BigInt::FastCalc' =>
{
'MAINTAINER' => 'rafl',
- 'DISTRIBUTION' => 'FLORA/Math-BigInt-FastCalc-0.29.tar.gz',
+ 'DISTRIBUTION' => 'PJACKLAM/Math-BigInt-FastCalc-0.30.tar.gz',
'FILES' => q[dist/Math-BigInt-FastCalc],
'EXCLUDED' => [ qr{^inc/},
qw{
package perlfaq;
-BEGIN {
- $perlfaq::VERSION = '5.01500302';
+{
+ $perlfaq::VERSION = '5.0150033';
}
0; # not is it supposed to be loaded
you should have the perlfaq. You should also have the C<perldoc> tool
that lets you read the L<perlfaq>:
- $ perldoc perlfaq
+ $ perldoc perlfaq
or search the perlfaq question headings:
- $ perldoc -q open
+ $ perldoc -q open
See L<perldoc> for more information.
drives http://faq.perl.org/ and will be distributed with the next
release of Perl 5.
-You can mail corrections, additions, and suggestions to
-C<< <perlfaq-workers AT perl DOT org> >>. The perlfaq volunteers use this
-address to coordinate their efforts and track the perlfaq development.
-They appreciate your contributions to the FAQ but do not have time to
-provide individual help, so don't use this address to ask FAQs.
-
-The perlfaq server posts extracts of the perlfaq to that newsgroup
-every 6 hours (or so), and the community of volunteers reviews and
-updates the answers. If you'd like to help review and update the
-answers, check out comp.lang.perl.misc.
-
-You can also fork the perl repository, make your changes, and send them
-to Perl 5 Porters. See L<perlgit>.
-
-=head2 What will happen if you mail your Perl programming problems to the authors?
-
-The perlfaq-workers like to keep all traffic on the perlfaq-workers list
-so that everyone can see the work being done (and the work that needs to
-be done). The mailing list serves as an official record. If you email the
-authors or maintainers directly, you'll probably get a reply asking you
-to post to the mailing list. If you don't get a reply, it probably means
-that the person never saw the message or didn't have time to deal with
-it. Posting to the list allows the volunteers with time to deal with it
-when others are busy.
+=head2 What if my question isn't answered in the FAQ?
If you have a question that isn't in the FAQ and you would like help with
it, try the resources in L<perlfaq2>.
=head1 CREDITS
Tom Christiansen wrote the original perlfaq then expanded it with the
-help of Nat Torkington. The perlfaq-workers maintain current document
-and the denizens of comp.lang.perl.misc regularly review and update the
-FAQ. Several people have contributed answers, corrections, and comments,
-and the perlfaq notes those contributions wherever appropriate.
+help of Nat Torkington. brian d foy substantialy edited and expanded
+the perlfaq. perlfaq-workers and others have also supplied feedback
+and corrections over the years.
=head1 AUTHOR AND COPYRIGHT
=item *
-There is no Perl 6 release scheduled, but it will be available when
-it's ready. The joke is that it's scheduled for Christmas, but that we
-just don't know which one. Stay tuned, but don't worry that you'll
-have to change major versions of Perl; no one is going to take Perl 5
-away from you.
+The current leading implementation of Perl 6, Rakudo, released a "useful,
+usable, 'early adopter'" distribution of Perl 6 (called Rakudo Star) in July of
+2010. Please see http://rakudo.org/ for more information.
=item *
(contributed by brian d foy)
-In short, Perl 4 is the past, Perl 5 is the present, and Perl 6 is the
-future.
+In short, Perl 4 is the parent to both Perl 5 and Perl 6. Perl 5 is the older
+sibling, and though they are different languages, someone who knows one will
+spot many similarities in the other.
The number after Perl (i.e. the 5 after Perl 5) is the major release
of the perl interpreter as well as the version of the language. Each
concept of references, complex data structures, and modules. The Perl
5 interpreter was a complete re-write of the previous perl sources.
-Perl 6 is the next major version of Perl, although it's not intended to
-replace Perl 5. It's still in development in both its syntax and
-design. The work started in 2002 and is still ongoing. Some of the
-most interesting features have shown up in the latest versions of Perl
-5, and some Perl 5 modules allow you to use some Perl 6 syntax in your
-programs. The current leading implementation of Perl 6 is Rakudo (
-http://rakudo.org ).
+Perl 6 was originally described as the community's rewrite of Perl 5.
+Development started in 2002; syntax and design work continue to this day. As the
+language has evolved, it has become clear that it is a separate language,
+incompatible with Perl 5 but in the same language family. Contrary to popular
+belief, Perl 6 and Perl 5 peacefully coexist with one another. That said, Perl 6
+has proven to be a fascinating source of ideas for those using Perl 5 (the Moose
+object system is a well-known example). There is overlap in the communities, and
+this overlap fosters the tradition of sharing and borrowing that have been
+instrumental to Perl's success. The current leading implementation of Perl 6 is
+Rakudo, and you can learn more about it at http://rakudo.org.
See L<perlhist> for a history of Perl revisions.
the crusade to make Perl a better place then read the Perl 6 developers
page at http://dev.perl.org/perl6/ and get involved.
-Perl 6 is not scheduled for release yet, and Perl 5 will still be supported
-for quite awhile after its release. Do not wait for Perl 6 to do whatever
-you need to do.
-
"We're really serious about reinventing everything that needs reinventing."
--Larry Wall
+As Perl 6 is a reinvention of Perl, it is a language in the same lineage but
+not compatible. The two are complementary, not mutually exclusive. Perl 6 is not
+meant to replace Perl 5, and vice versa.
+
=head2 How stable is Perl?
Production releases, which incorporate bug fixes and new functionality,
previously used the phrase with many subjects ("Just another x hacker,"),
so to distinguish his JAPH, he started to write them as Perl programs:
- print "Just another Perl hacker,";
+ print "Just another Perl hacker,";
Other people picked up on this and started to write clever or obfuscated
programs to produce the same output, spinning things quickly out of
=head2 How can I get a binary version of perl?
-(contributed by brian d foy)
-
-ActiveState: Windows, Linux, Mac OS X, Solaris, AIX and HP-UX
-
- http://www.activestate.com/
-
-Sunfreeware.com: Solaris 2.5 to Solaris 10 (SPARC and x86)
-
- http://www.sunfreeware.com/
-
-Strawberry Perl: Windows, Perl 5.8.8 and 5.10.0
-
- http://www.strawberryperl.com
-
-IndigoPerl: Windows
-
- http://indigostar.com/
+See L<CPAN Ports|http://www.cpan.org/ports/>
=head2 I don't have a C compiler. How can I build my own Perl interpreter?
You might look around the net for a pre-built binary of Perl (or a
C compiler!) that meets your needs, though:
-For Windows, Vanilla Perl ( http://vanillaperl.com/ ) and Strawberry Perl
-( http://strawberryperl.com/ ) come with a
+For Windows, L<Vanilla Perl|http://vanillaperl.com/> and
+L<Strawberry Perl|http://strawberryperl.com/> come with a
bundled C compiler. ActivePerl is a pre-compiled version of Perl
ready-to-use.
and L<perlxstut> for linking C and Perl together. There may be more
by the time you read this. These URLs might also be useful:
- http://perldoc.perl.org/
- http://www.perl.org/
- http://learn.perl.org/
+=over 4
+
+=item * L<Perldoc|http://perldoc.perl.org/>
+
+=item * L<Perl.org|http://www.perl.org/>
+
+=item * L<Learn.perl.org|http://learn.perl.org/>
+
+=back
=head2 What are the Perl newsgroups on Usenet? Where do I post questions?
Several groups devoted to the Perl language are on Usenet:
- comp.lang.perl.announce Moderated announcement group
- comp.lang.perl.misc High traffic general Perl discussion
- comp.lang.perl.moderated Moderated discussion group
- comp.lang.perl.modules Use and development of Perl modules
- comp.lang.perl.tk Using Tk (and X) from Perl
+ comp.lang.perl.announce Moderated announcement group
+ comp.lang.perl.misc High traffic general Perl discussion
+ comp.lang.perl.moderated Moderated discussion group
+ comp.lang.perl.modules Use and development of Perl modules
+ comp.lang.perl.tk Using Tk (and X) from Perl
Some years ago, comp.lang.perl was divided into those groups, and
comp.lang.perl itself officially removed. While that group may still
=head2 What mailing lists are there for Perl?
-Most of the major modules (C<Tk>, C<CGI>, C<libwww-perl>) have their own
+Most of the major modules (L<Tk>, L<CGI>, L<libwww-perl>) have their own
mailing lists. Consult the documentation that came with the module for
subscription information.
-A comprehensive list of Perl-related mailing lists can be found at:
-
- http://lists.perl.org/
+A comprehensive list of Perl-related mailing lists can be found at
+http://lists.perl.org/
=head2 Where are the archives for comp.lang.perl.misc?
sends the message to the right place.
To determine if a module came with your version of Perl, you can
-use the C<Module::CoreList> module. It has the information about
+use the L<Module::CoreList> module. It has the information about
the modules (with their versions) included with each release of Perl.
-If C<Module::CoreList> is not installed on your system, check out
+If L<Module::CoreList> is not installed on your system, check out
http://perlpunks.de/corelist .
Every CPAN module has a bug tracker set up in RT, http://rt.cpan.org .
You can submit bugs to RT either through its web interface or by
email. To email a bug report, send it to
bug-E<lt>distribution-nameE<gt>@rt.cpan.org . For example, if you
-wanted to report a bug in C<Business::ISBN>, you could send a message to
+wanted to report a bug in L<Business::ISBN>, you could send a message to
bug-Business-ISBN@rt.cpan.org .
Some modules might have special reporting requirements, such as a
=head2 What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org?
-Perl.com ( http://www.perl.com/ ) used to be part of the O'Reilly
+L<Perl.com|http://www.perl.com/> used to be part of the O'Reilly
Network, a subsidiary of O'Reilly Media. Although it retains most of
the original content from its O'Reilly Network, it is now hosted by
The Perl Foundation.
The Perl Foundation is an advocacy organization for the Perl language
-which maintains the web site ( http://www.perl.org/ ) as a general
+which maintains the web site L<Perl.org|http://www.perl.org/> as a general
advocacy site for the Perl language. It uses the domain to provide
general support services to the Perl community, including the hosting
of mailing lists, web sites, and other services. There are also many
other sub-domains for special topics like learning Perl, Perl news,
jobs in Perl, such as:
- http://www.perl.org/
- http://learn.perl.org/
- http://jobs.perl.org/
- http://lists.perl.org/
+=over 4
+
+=item * L<Perl.org|http://www.perl.org/>
+
+=item * L<Learn.perl.org|http://learn.perl.org/>
+
+=item * L<Jobs.perl.org|http://jobs.perl.org/>
+
+=item * L<Lists.perl.org|http://lists.perl.org/>
+
+=back
Perl Mongers uses the pm.org domain for services related to Perl user
groups, including the hosting of mailing lists and web sites. See the
-Perl Mongers website ( http://www.pm.org/ ) for more information about
+L<Perl Mongers website|http://www.pm.org/> for more information about
joining, starting, or requesting services for a Perl user group.
-CPAN, or the Comprehensive Perl Archive Network (
-http://www.cpan.org/ ), is a replicated, worldwide repository of Perl
-software.
+CPAN, or the L<Comprehensive Perl Archive Network|http://www.cpan.org/>,
+is a replicated, worldwide repository of Perl software.
See L<What is CPAN?|/"What modules and extensions are available for Perl? What is CPAN? What does CPANE<sol>srcE<sol>... mean?">.
=head1 AUTHOR AND COPYRIGHT
someone has already written a module that can solve your problem.
Have you read the appropriate manpages? Here's a brief index:
- Basics perldata, perlvar, perlsyn, perlop, perlsub
- Execution perlrun, perldebug
- Functions perlfunc
- Objects perlref, perlmod, perlobj, perltie
- Data Structures perlref, perllol, perldsc
- Modules perlmod, perlmodlib, perlsub
- Regexes perlre, perlfunc, perlop, perllocale
- Moving to perl5 perltrap, perl
- Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed
- Various http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz
- (not a man-page but still useful, a collection
- of various essays on Perl techniques)
+=over 4
+
+=item Basics
+
+perldata, perlvar, perlsyn, perlop, perlsub
+
+=item Execution
+
+perlrun, perldebug
+
+=item Functions
+
+perlfunc
+
+=item Objects
+
+perlref, perlmod, perlobj, perltie
+
+=item Data Structures
+
+perlref, perllol, perldsc
+
+=item Modules
+
+perlmod, perlmodlib, perlsub
+
+=item Regexes
+
+perlre, perlfunc, perlop, perllocale
+
+=item Moving to perl5
+
+perltrap, perl
+
+=item Linking with C
+
+perlxstut, perlxs, perlcall, perlguts, perlembed
+
+=item Various
+
+http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz
+(not a man-page but still useful, a collection of various essays on
+Perl techniques)
+
+=back
A crude table of contents for the Perl manpage set is found in L<perltoc>.
=head2 How can I use Perl interactively?
The typical approach uses the Perl debugger, described in the
-C<perldebug(1)> manpage, on an "empty" program, like this:
+L<perldebug(1)> manpage, on an "empty" program, like this:
perl -de 42
=head2 Is there a Perl shell?
-The C<psh> (Perl sh) is currently at version 1.8. The Perl Shell is a shell
+The L<psh> (Perl sh) is currently at version 1.8. The Perl Shell is a shell
that combines the interactive nature of a Unix shell with the power of
Perl. The goal is a full-featured shell that behaves as expected for
normal shell activity and uses Perl syntax and functionality for
-control-flow statements and other things. You can get C<psh> at
-http://sourceforge.net/projects/psh/ .
+control-flow statements and other things. You can get L<psh> at
+https://metacpan.org/release/psh .
-C<Zoidberg> is a similar project and provides a shell written in perl,
+L<Zoidberg> is a similar project and provides a shell written in perl,
configured in perl and operated in perl. It is intended as a login shell
and development environment. It can be found at
-http://pardus-larus.student.utwente.nl/~pardus/projects/zoidberg/
-or your local CPAN mirror.
+https://metacpan.org/release/Zoidberg.
The C<Shell.pm> module (distributed with Perl) makes Perl try commands
which aren't part of the Perl language as shell commands. C<perlsh> from
From the command line, you can use the C<cpan> command's C<-l> switch:
- $ cpan -l
+ $ cpan -l
You can also use C<cpan>'s C<-a> switch to create an autobundle file
that C<CPAN.pm> understands and can use to re-install every module:
- $ cpan -a
+ $ cpan -a
-Inside a Perl program, you can use the C<ExtUtils::Installed> module to
+Inside a Perl program, you can use the L<ExtUtils::Installed> module to
show all installed distributions, although it can take awhile to do
its magic. The standard library which comes with Perl just shows up
-as "Perl" (although you can get those with C<Module::CoreList>).
+as "Perl" (although you can get those with L<Module::CoreList>).
- use ExtUtils::Installed;
+ use ExtUtils::Installed;
- my $inst = ExtUtils::Installed->new();
- my @modules = $inst->modules();
+ my $inst = ExtUtils::Installed->new();
+ my @modules = $inst->modules();
If you want a list of all of the Perl module filenames, you
-can use C<File::Find::Rule>:
+can use L<File::Find::Rule>:
- use File::Find::Rule;
+ use File::Find::Rule;
- my @files = File::Find::Rule->
- extras({follow => 1})->
- file()->
- name( '*.pm' )->
- in( @INC )
- ;
+ my @files = File::Find::Rule->
+ extras({follow => 1})->
+ file()->
+ name( '*.pm' )->
+ in( @INC )
+ ;
If you do not have that module, you can do the same thing
-with C<File::Find> which is part of the standard library:
+with L<File::Find> which is part of the standard library:
- use File::Find;
- my @files;
+ use File::Find;
+ my @files;
- find(
- {
- wanted => sub {
- push @files, $File::Find::fullname
- if -f $File::Find::fullname && /\.pm$/
- },
- follow => 1,
- follow_skip => 2,
- },
- @INC
- );
+ find(
+ {
+ wanted => sub {
+ push @files, $File::Find::fullname
+ if -f $File::Find::fullname && /\.pm$/
+ },
+ follow => 1,
+ follow_skip => 2,
+ },
+ @INC
+ );
- print join "\n", @files;
+ print join "\n", @files;
If you simply need to check quickly to see if a module is
available, you can check for its documentation. If you can
If you cannot read the documentation, the module might not
have any (in rare cases):
- $ perldoc Module::Name
+ $ perldoc Module::Name
You can also try to include the module in a one-liner to see if
perl finds it:
- $ perl -MModule::Name -e1
+ $ perl -MModule::Name -e1
=head2 How do I debug my Perl programs?
they get too big. You can find out more about these in L<strict>
and L<warnings>.
- #!/usr/bin/perl
- use strict;
- use warnings;
+ #!/usr/bin/perl
+ use strict;
+ use warnings;
Beyond that, the simplest debugger is the C<print> function. Use it
to look at values as you run your program:
- print STDERR "The value is [$value]\n";
+ print STDERR "The value is [$value]\n";
-The C<Data::Dumper> module can pretty-print Perl data structures:
+The L<Data::Dumper> module can pretty-print Perl data structures:
- use Data::Dumper qw( Dumper );
- print STDERR "The hash is " . Dumper( \%hash ) . "\n";
+ use Data::Dumper qw( Dumper );
+ print STDERR "The hash is " . Dumper( \%hash ) . "\n";
Perl comes with an interactive debugger, which you can start with the
C<-d> switch. It's fully explained in L<perldebug>.
-If you'd like a graphical user interface and you have C<Tk>, you can use
+If you'd like a graphical user interface and you have L<Tk>, you can use
C<ptkdb>. It's on CPAN and available for free.
If you need something much more sophisticated and controllable, Leon
-Brocard's C<Devel::ebug> (which you can call with the C<-D> switch as C<-Debug>)
+Brocard's L<Devel::ebug> (which you can call with the C<-D> switch as C<-Debug>)
gives you the programmatic hooks into everything you need to write your
own (without too much pain and suffering).
The C<Devel> namespace has several modules which you can use to
profile your Perl programs.
-The C<Devel::NYTProf> (New York Times Profiler) does both statement
+The L<Devel::NYTProf> (New York Times Profiler) does both statement
and subroutine profiling. It's available from CPAN and you also invoke
it with the C<-d> switch:
- perl -d:NYTProf some_perl.pl
+ perl -d:NYTProf some_perl.pl
It creates a database of the profile information that you can turn into
reports. The C<nytprofhtml> command turns the data into an HTML report
-similar to the C<Devel::Cover> report:
+similar to the L<Devel::Cover> report:
- nytprofhtml
+ nytprofhtml
CPAN has several other profilers that you can invoke in the same
-fashion. You might also be interested in using the C<Benchmark> to
+fashion. You might also be interested in using the L<Benchmark> to
measure and compare code snippets.
You can read more about profiling in I<Programming Perl>, chapter 20,
=head2 How do I cross-reference my Perl programs?
-The C<B::Xref> module can be used to generate cross-reference reports
+The L<B::Xref> module can be used to generate cross-reference reports
for Perl programs.
perl -MO=Xref[,OPTIONS] scriptname.plx
=head2 Is there a pretty-printer (formatter) for Perl?
-C<Perltidy> is a Perl script which indents and reformats Perl scripts
-to make them easier to read by trying to follow the rules of the
-L<perlstyle>. If you write Perl scripts, or spend much time reading
-them, you will probably find it useful. It is available at
-http://perltidy.sourceforge.net .
+L<Perl::Tidy> comes with a perl script L<perltidy> which indents and
+reformats Perl scripts to make them easier to read by trying to follow
+the rules of the L<perlstyle>. If you write Perl, or spend much time reading
+Perl, you will probably find it useful.
Of course, if you simply follow the guidelines in L<perlstyle>,
you shouldn't need to reformat. The habit of formatting your code
Don't read an entire file into memory if you can process it line
by line. Or more concretely, use a loop like this:
- #
- # Good Idea
- #
- while (<FILE>) {
- # ...
- }
+ #
+ # Good Idea
+ #
+ while (<FILE>) {
+ # ...
+ }
instead of this:
- #
- # Bad Idea
- #
- @data = <FILE>;
- foreach (@data) {
- # ...
- }
+ #
+ # Bad Idea
+ #
+ @data = <FILE>;
+ foreach (@data) {
+ # ...
+ }
When the files you're processing are small, it doesn't much matter which
way you do it, but it makes a huge difference when they start getting
Ditto for stringifying large arrays:
- {
- local $, = "\n";
- print @big_array;
- }
+ {
+ local $, = "\n";
+ print @big_array;
+ }
is much more memory-efficient than either
- print join "\n", @big_array;
+ print join "\n", @big_array;
or
- {
- local $" = "\n";
- print "@big_array";
- }
+ {
+ local $" = "\n";
+ print "@big_array";
+ }
=item * Pass by reference
everything works out right.
sub makeone {
- my @a = ( 1 .. 10 );
- return \@a;
+ my @a = ( 1 .. 10 );
+ return \@a;
}
for ( 1 .. 10 ) {
# display reasonable manpath
echo $PATH | perl -nl -072 -e '
- s![^/+]*$!man!&&-d&&!$s{$_}++&&push@m,$_;END{print"@m"}'
+ s![^/+]*$!man!&&-d&&!$s{$_}++&&push@m,$_;END{print"@m"}'
OK, the last one was actually an Obfuscated Perl Contest entry. :-)
when it runs fine on the command line", see the troubleshooting
guides and references in L<perlfaq9> or in the CGI MetaFAQ:
- http://www.perl.org/CGI_MetaFAQ.html
+ http://www.perl.org/CGI_MetaFAQ.html
=head2 Where can I learn about object-oriented Perl programming?
(contributed by brian d foy)
-The C<ExtUtils::MakeMaker> module, better known simply as "MakeMaker",
+The L<ExtUtils::MakeMaker> module, better known simply as "MakeMaker",
turns a Perl script, typically called C<Makefile.PL>, into a Makefile.
The Unix tool C<make> uses this file to manage dependencies and actions
to process and install a Perl distribution.
C<printf> or C<sprintf> function. See
L<perlop/"Floating-point Arithmetic"> for more details.
- printf "%.2f", 10/3;
+ printf "%.2f", 10/3;
- my $number = sprintf "%.2f", 10/3;
+ my $number = sprintf "%.2f", 10/3;
=head2 Why is int() broken?
For example, this
- print int(0.6/0.2-2), "\n";
+ print int(0.6/0.2-2), "\n";
will in most computers print 0, not 1, because even such simple
numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
ignores leading spaces and zeroes, then assumes the rest of the digits
are in base 10:
- my $string = '0644';
+ my $string = '0644';
- print $string + 0; # prints 644
+ print $string + 0; # prints 644
- print $string + 44; # prints 688, certainly not octal!
+ print $string + 44; # prints 688, certainly not octal!
This problem usually involves one of the Perl built-ins that has the
same name a Unix command that uses octal numbers as arguments on the
command line. In this example, C<chmod> on the command line knows that
its first argument is octal because that's what it does:
- %prompt> chmod 644 file
+ %prompt> chmod 644 file
If you want to use the same literal digits (644) in Perl, you have to tell
Perl to treat them as octal numbers either by prefixing the digits with
a C<0> or using C<oct>:
- chmod( 0644, $file); # right, has leading zero
- chmod( oct(644), $file ); # also correct
+ chmod( 0644, $file); # right, has leading zero
+ chmod( oct(644), $file ); # also correct
The problem comes in when you take your numbers from something that Perl
thinks is a string, such as a command line argument in C<@ARGV>:
- chmod( $ARGV[0], $file); # wrong, even if "0644"
+ chmod( $ARGV[0], $file); # wrong, even if "0644"
- chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
+ chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
You can always check the value you're using by printing it in octal
notation to ensure it matches what you think it should be. Print it
in octal and decimal format:
- printf "0%o %d", $number, $number;
+ printf "0%o %d", $number, $number;
=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
certain number of digits, C<sprintf()> or C<printf()> is usually the
easiest route.
- printf("%.3f", 3.1415926535); # prints 3.142
+ printf("%.3f", 3.1415926535); # prints 3.142
-The C<POSIX> module (part of the standard Perl distribution)
+The Perltidy module (part of the standard Perl distribution)
implements C<ceil()>, C<floor()>, and a number of other mathematical
and trigonometric functions.
- use POSIX;
- $ceil = ceil(3.5); # 4
- $floor = floor(3.5); # 3
+ use POSIX;
+ $ceil = ceil(3.5); # 4
+ $floor = floor(3.5); # 3
-In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
-module. With 5.004, the C<Math::Trig> module (part of the standard Perl
+In 5.000 to 5.003 perls, trigonometry was done in the L<Math::Complex>
+module. With 5.004, the L<Math::Trig> module (part of the standard Perl
distribution) implements the trigonometric functions. Internally it
-uses the C<Math::Complex> module and some functions can break out from
+uses the L<Math::Complex> module and some functions can break out from
the real axis into the complex plane, for example the inverse sine of
2.
To see why, notice how you'll still have an issue on half-way-point
alternation:
- for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
+ for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
- 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
- 0.8 0.8 0.9 0.9 1.0 1.0
+ 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
+ 0.8 0.8 0.9 0.9 1.0 1.0
Don't blame Perl. It's the same as in C. IEEE says we have to do
this. Perl numbers whose absolute values are integers under 2**31 (on
representations. This is intended to be representational rather than
exhaustive.
-Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
-module from CPAN. The reason you might choose C<Bit::Vector> over the
+Some of the examples later in L<perlfaq4> use the L<Bit::Vector>
+module from CPAN. The reason you might choose L<Bit::Vector> over the
perl built-in functions is that it works with numbers of ANY size,
that it is optimized for speed on some operations, and for at least
some programmers the notation might be familiar.
Using perl's built in conversion of C<0x> notation:
- $dec = 0xDEADBEEF;
+ $dec = 0xDEADBEEF;
Using the C<hex> function:
- $dec = hex("DEADBEEF");
+ $dec = hex("DEADBEEF");
Using C<pack>:
- $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
+ $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
Using the CPAN module C<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
- $dec = $vec->to_Dec();
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
+ $dec = $vec->to_Dec();
=item How do I convert from decimal to hexadecimal
Using C<sprintf>:
- $hex = sprintf("%X", 3735928559); # upper case A-F
- $hex = sprintf("%x", 3735928559); # lower case a-f
+ $hex = sprintf("%X", 3735928559); # upper case A-F
+ $hex = sprintf("%x", 3735928559); # lower case a-f
Using C<unpack>:
- $hex = unpack("H*", pack("N", 3735928559));
+ $hex = unpack("H*", pack("N", 3735928559));
-Using C<Bit::Vector>:
+Using L<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new_Dec(32, -559038737);
- $hex = $vec->to_Hex();
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $hex = $vec->to_Hex();
-And C<Bit::Vector> supports odd bit counts:
+And L<Bit::Vector> supports odd bit counts:
- use Bit::Vector;
- $vec = Bit::Vector->new_Dec(33, 3735928559);
- $vec->Resize(32); # suppress leading 0 if unwanted
- $hex = $vec->to_Hex();
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(33, 3735928559);
+ $vec->Resize(32); # suppress leading 0 if unwanted
+ $hex = $vec->to_Hex();
=item How do I convert from octal to decimal
Using Perl's built in conversion of numbers with leading zeros:
- $dec = 033653337357; # note the leading 0!
+ $dec = 033653337357; # note the leading 0!
Using the C<oct> function:
- $dec = oct("33653337357");
+ $dec = oct("33653337357");
-Using C<Bit::Vector>:
+Using L<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new(32);
- $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
- $dec = $vec->to_Dec();
+ use Bit::Vector;
+ $vec = Bit::Vector->new(32);
+ $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
+ $dec = $vec->to_Dec();
=item How do I convert from decimal to octal
Using C<sprintf>:
- $oct = sprintf("%o", 3735928559);
+ $oct = sprintf("%o", 3735928559);
-Using C<Bit::Vector>:
+Using L<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new_Dec(32, -559038737);
- $oct = reverse join('', $vec->Chunk_List_Read(3));
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $oct = reverse join('', $vec->Chunk_List_Read(3));
=item How do I convert from binary to decimal
Perl 5.6 lets you write binary numbers directly with
the C<0b> notation:
- $number = 0b10110110;
+ $number = 0b10110110;
Using C<oct>:
- my $input = "10110110";
- $decimal = oct( "0b$input" );
+ my $input = "10110110";
+ $decimal = oct( "0b$input" );
Using C<pack> and C<ord>:
- $decimal = ord(pack('B8', '10110110'));
+ $decimal = ord(pack('B8', '10110110'));
Using C<pack> and C<unpack> for larger strings:
- $int = unpack("N", pack("B32",
- substr("0" x 32 . "11110101011011011111011101111", -32)));
- $dec = sprintf("%d", $int);
+ $int = unpack("N", pack("B32",
+ substr("0" x 32 . "11110101011011011111011101111", -32)));
+ $dec = sprintf("%d", $int);
- # substr() is used to left-pad a 32-character string with zeros.
+ # substr() is used to left-pad a 32-character string with zeros.
-Using C<Bit::Vector>:
+Using L<Bit::Vector>:
- $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
- $dec = $vec->to_Dec();
+ $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
+ $dec = $vec->to_Dec();
=item How do I convert from decimal to binary
Using C<sprintf> (perl 5.6+):
- $bin = sprintf("%b", 3735928559);
+ $bin = sprintf("%b", 3735928559);
Using C<unpack>:
- $bin = unpack("B*", pack("N", 3735928559));
+ $bin = unpack("B*", pack("N", 3735928559));
-Using C<Bit::Vector>:
+Using L<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new_Dec(32, -559038737);
- $bin = $vec->to_Bin();
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $bin = $vec->to_Bin();
The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
are left as an exercise to the inclined reader.
to numbers explicitly (using C<0+$arg>). The rest arise because
the programmer says:
- if ("\020\020" & "\101\101") {
- # ...
- }
+ if ("\020\020" & "\101\101") {
+ # ...
+ }
but a string consisting of two null bytes (the result of C<"\020\020"
& "\101\101">) is not a false value in Perl. You need:
- if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
- # ...
- }
+ if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
+ # ...
+ }
=head2 How do I multiply matrices?
-Use the C<Math::Matrix> or C<Math::MatrixReal> modules (available from CPAN)
-or the C<PDL> extension (also available from CPAN).
+Use the L<Math::Matrix> or L<Math::MatrixReal> modules (available from CPAN)
+or the L<PDL> extension (also available from CPAN).
=head2 How do I perform an operation on a series of integers?
To call a function on each element in an array, and collect the
results, use:
- @results = map { my_func($_) } @array;
+ @results = map { my_func($_) } @array;
For example:
- @triple = map { 3 * $_ } @single;
+ @triple = map { 3 * $_ } @single;
To call a function on each element of an array, but ignore the
results:
- foreach $iterator (@array) {
- some_func($iterator);
- }
+ foreach $iterator (@array) {
+ some_func($iterator);
+ }
To call a function on each integer in a (small) range, you B<can> use:
- @results = map { some_func($_) } (5 .. 25);
+ @results = map { some_func($_) } (5 .. 25);
but you should be aware that the C<..> operator creates a list of
all integers in the range. This can take a lot of memory for large
ranges. Instead use:
- @results = ();
- for ($i=5; $i <= 500_005; $i++) {
- push(@results, some_func($i));
- }
+ @results = ();
+ for ($i=5; $i <= 500_005; $i++) {
+ push(@results, some_func($i));
+ }
This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
loop will iterate over the range, without creating the entire range.
- for my $i (5 .. 500_005) {
- push(@results, some_func($i));
- }
+ for my $i (5 .. 500_005) {
+ push(@results, some_func($i));
+ }
will not create a list of 500,000 integers.
If you're using a version of Perl before 5.004, you must call C<srand>
once at the start of your program to seed the random number generator.
- BEGIN { srand() if $] < 5.004 }
+ BEGIN { srand() if $] < 5.004 }
5.004 and later automatically call C<srand> at the beginning. Don't
call C<srand> more than once--you make your numbers less random,
course, living in a state of sin."
If you want numbers that are more random than C<rand> with C<srand>
-provides, you should also check out the C<Math::TrulyRandom> module from
+provides, you should also check out the L<Math::TrulyRandom> module from
CPAN. It uses the imperfections in your system's timer to generate
random numbers, but this takes quite a while. If you want a better
pseudorandom generator than comes with your operating system, look at
That is, to get a number between 10 and 15, inclusive, you want a
random number between 0 and 5 that you can then add to 10.
- my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
+ my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
Hence you derive the following simple function to abstract
that. It selects a random integer between the two given
integers (inclusive), For example: C<random_int_between(50,120)>.
- sub random_int_between {
- my($min, $max) = @_;
- # Assumes that the two arguments are integers themselves!
- return $min if $min == $max;
- ($min, $max) = ($max, $min) if $min > $max;
- return $min + int rand(1 + $max - $min);
- }
+ sub random_int_between {
+ my($min, $max) = @_;
+ # Assumes that the two arguments are integers themselves!
+ return $min if $min == $max;
+ ($min, $max) = ($max, $min) if $min > $max;
+ return $min + int rand(1 + $max - $min);
+ }
=head1 Data: Dates
by the C<localtime> function. Without an
argument C<localtime> uses the current time.
- my $day_of_year = (localtime)[7];
+ my $day_of_year = (localtime)[7];
-The C<POSIX> module can also format a date as the day of the year or
+The L<POSIX> module can also format a date as the day of the year or
week of the year.
- use POSIX qw/strftime/;
- my $day_of_year = strftime "%j", localtime;
- my $week_of_year = strftime "%W", localtime;
+ use POSIX qw/strftime/;
+ my $day_of_year = strftime "%j", localtime;
+ my $week_of_year = strftime "%W", localtime;
-To get the day of year for any date, use C<POSIX>'s C<mktime> to get
+To get the day of year for any date, use L<POSIX>'s C<mktime> to get
a time in epoch seconds for the argument to C<localtime>.
- use POSIX qw/mktime strftime/;
- my $week_of_year = strftime "%W",
- localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
+ use POSIX qw/mktime strftime/;
+ my $week_of_year = strftime "%W",
+ localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
-You can also use C<Time::Piece>, which comes with Perl and provides a
+You can also use L<Time::Piece>, which comes with Perl and provides a
C<localtime> that returns an object:
- use Time::Piece;
- my $day_of_year = localtime->yday;
- my $week_of_year = localtime->week;
+ use Time::Piece;
+ my $day_of_year = localtime->yday;
+ my $week_of_year = localtime->week;
-The C<Date::Calc> module provides two functions to calculate these, too:
+The L<Date::Calc> module provides two functions to calculate these, too:
- use Date::Calc;
- my $day_of_year = Day_of_Year( 1987, 12, 18 );
- my $week_of_year = Week_of_Year( 1987, 12, 18 );
+ use Date::Calc;
+ my $day_of_year = Day_of_Year( 1987, 12, 18 );
+ my $week_of_year = Week_of_Year( 1987, 12, 18 );
=head2 How do I find the current century or millennium?
Use the following simple functions:
- sub get_century {
- return int((((localtime(shift || time))[5] + 1999))/100);
- }
+ sub get_century {
+ return int((((localtime(shift || time))[5] + 1999))/100);
+ }
- sub get_millennium {
- return 1+int((((localtime(shift || time))[5] + 1899))/1000);
- }
+ sub get_millennium {
+ return 1+int((((localtime(shift || time))[5] + 1899))/1000);
+ }
-On some systems, the C<POSIX> module's C<strftime()> function has been
+On some systems, the L<POSIX> module's C<strftime()> function has been
extended in a non-standard way to use a C<%C> format, which they
sometimes claim is the "century". It isn't, because on most such
systems, this is only the first two digits of the four-digit year, and
You could just store all your dates as a number and then subtract.
Life isn't always that simple though.
-The C<Time::Piece> module, which comes with Perl, replaces C<localtime>
+The L<Time::Piece> module, which comes with Perl, replaces L<localtime>
with a version that returns an object. It also overloads the comparison
operators so you can compare them directly:
- use Time::Piece;
- my $date1 = localtime( $some_time );
- my $date2 = localtime( $some_other_time );
+ use Time::Piece;
+ my $date1 = localtime( $some_time );
+ my $date2 = localtime( $some_other_time );
- if( $date1 < $date2 ) {
- print "The date was in the past\n";
- }
+ if( $date1 < $date2 ) {
+ print "The date was in the past\n";
+ }
You can also get differences with a subtraction, which returns a
-C<Time::Seconds> object:
+L<Time::Seconds> object:
- my $diff = $date1 - $date2;
- print "The difference is ", $date_diff->days, " days\n";
+ my $diff = $date1 - $date2;
+ print "The difference is ", $date_diff->days, " days\n";
-If you want to work with formatted dates, the C<Date::Manip>,
-C<Date::Calc>, or C<DateTime> modules can help you.
+If you want to work with formatted dates, the L<Date::Manip>,
+L<Date::Calc>, or L<DateTime> modules can help you.
=head2 How can I take a string and turn it into epoch seconds?
If it's a regular enough string that it always has the same format,
you can split it up and pass the parts to C<timelocal> in the standard
-C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>,
-C<Date::Parse>, and C<Date::Manip> modules from CPAN.
+L<Time::Local> module. Otherwise, you should look into the L<Date::Calc>,
+L<Date::Parse>, and L<Date::Manip> modules from CPAN.
=head2 How can I find the Julian Day?
(contributed by brian d foy and Dave Cross)
-You can use the C<Time::Piece> module, part of the Standard Library,
+You can use the L<Time::Piece> module, part of the Standard Library,
which can convert a date/time to a Julian Day:
- $ perl -MTime::Piece -le 'print localtime->julian_day'
- 2455607.7959375
+ $ perl -MTime::Piece -le 'print localtime->julian_day'
+ 2455607.7959375
Or the modified Julian Day:
- $ perl -MTime::Piece -le 'print localtime->mjd'
- 55607.2961226851
+ $ perl -MTime::Piece -le 'print localtime->mjd'
+ 55607.2961226851
Or even the day of the year (which is what some people think of as a
Julian day):
- $ perl -MTime::Piece -le 'print localtime->yday'
- 45
+ $ perl -MTime::Piece -le 'print localtime->yday'
+ 45
-You can also do the same things with the C<DateTime> module:
+You can also do the same things with the L<DateTime> module:
- $ perl -MDateTime -le'print DateTime->today->jd'
- 2453401.5
- $ perl -MDateTime -le'print DateTime->today->mjd'
- 53401
- $ perl -MDateTime -le'print DateTime->today->doy'
- 31
+ $ perl -MDateTime -le'print DateTime->today->jd'
+ 2453401.5
+ $ perl -MDateTime -le'print DateTime->today->mjd'
+ 53401
+ $ perl -MDateTime -le'print DateTime->today->doy'
+ 31
-You can use the C<Time::JulianDay> module available on CPAN. Ensure
+You can use the L<Time::JulianDay> module available on CPAN. Ensure
that you really want to find a Julian day, though, as many people have
different ideas about Julian days (see http://www.hermetic.ch/cal_stud/jdn.htm
for instance):
- $ perl -MTime::JulianDay -le 'print local_julian_day( time )'
- 55608
+ $ perl -MTime::JulianDay -le 'print local_julian_day( time )'
+ 55608
=head2 How do I find yesterday's date?
X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
(contributed by brian d foy)
To do it correctly, you can use one of the C<Date> modules since they
-work with calendars instead of times. The C<DateTime> module makes it
+work with calendars instead of times. The L<DateTime> module makes it
simple, and give you the same time of day, only the day before,
despite daylight saving time changes:
- use DateTime;
+ use DateTime;
- my $yesterday = DateTime->now->subtract( days => 1 );
+ my $yesterday = DateTime->now->subtract( days => 1 );
- print "Yesterday was $yesterday\n";
+ print "Yesterday was $yesterday\n";
-You can also use the C<Date::Calc> module using its C<Today_and_Now>
+You can also use the L<Date::Calc> module using its C<Today_and_Now>
function.
- use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
+ use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
- my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
+ my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
- print "@date_time\n";
+ print "@date_time\n";
Most people try to use the time rather than the calendar to figure out
dates, but that assumes that days are twenty-four hours each. For
and from summer time throws this off. For example, the rest of the
suggestions will be wrong sometimes:
-Starting with Perl 5.10, C<Time::Piece> and C<Time::Seconds> are part
+Starting with Perl 5.10, L<Time::Piece> and L<Time::Seconds> are part
of the standard distribution, so you might think that you could do
something like this:
- use Time::Piece;
- use Time::Seconds;
+ use Time::Piece;
+ use Time::Seconds;
- my $yesterday = localtime() - ONE_DAY; # WRONG
- print "Yesterday was $yesterday\n";
+ my $yesterday = localtime() - ONE_DAY; # WRONG
+ print "Yesterday was $yesterday\n";
-The C<Time::Piece> module exports a new C<localtime> that returns an
-object, and C<Time::Seconds> exports the C<ONE_DAY> constant that is a
+The L<Time::Piece> module exports a new C<localtime> that returns an
+object, and L<Time::Seconds> exports the C<ONE_DAY> constant that is a
set number of seconds. This means that it always gives the time 24
hours ago, which is not always yesterday. This can cause problems
around the end of daylight saving time when there's one day that is 25
hours long.
-You have the same problem with C<Time::Local>, which will give the wrong
+You have the same problem with L<Time::Local>, which will give the wrong
answer for those same special cases:
- # contributed by Gunnar Hjalmarsson
- use Time::Local;
- my $today = timelocal 0, 0, 12, ( localtime )[3..5];
- my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
- printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
+ # contributed by Gunnar Hjalmarsson
+ use Time::Local;
+ my $today = timelocal 0, 0, 12, ( localtime )[3..5];
+ my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
+ printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
=head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
still might get a warning on a 32-bit C<perl>:
- % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
- Integer overflow in hexadecimal number at -e line 1.
- Wed Nov 1 19:42:39 5576711
+ % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
+ Integer overflow in hexadecimal number at -e line 1.
+ Wed Nov 1 19:42:39 5576711
On a 64-bit C<perl>, you can get even larger dates for those really long
running projects:
- % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
- Thu Nov 2 00:42:39 5576711
+ % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
+ Thu Nov 2 00:42:39 5576711
You're still out of luck if you need to keep track of decaying protons
though.
There are many ways to ensure that values are what you expect or
want to accept. Besides the specific examples that we cover in the
perlfaq, you can also look at the modules with "Assert" and "Validate"
-in their names, along with other modules such as C<Regexp::Common>.
+in their names, along with other modules such as L<Regexp::Common>.
Some modules have validation for particular types of input, such
-as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
-and C<Data::Validate::IP>.
+as L<Business::ISBN>, L<Business::CreditCard>, L<Email::Valid>,
+and L<Data::Validate::IP>.
=head2 How do I unescape a string?
with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
character are removed with
- s/\\(.)/$1/g;
+ s/\\(.)/$1/g;
This won't expand C<"\n"> or C<"\t"> or any other special escapes.
that to require that the same thing immediately follow it. We replace
that part of the string with the character in C<$1>.
- s/(.)\g1/$1/g;
+ s/(.)\g1/$1/g;
We can also use the transliteration operator, C<tr///>. In this
example, the search list side of our C<tr///> contains nothing, but
duplicated and consecutive characters in the string so a character
does not show up next to itself
- my $str = 'Haarlem'; # in the Netherlands
- $str =~ tr///cs; # Now Harlem, like in New York
+ my $str = 'Haarlem'; # in the Netherlands
+ $str =~ tr///cs; # Now Harlem, like in New York
=head2 How do I expand function calls in a string?
have more than one return value, we can construct and dereference an
anonymous array. In this case, we call the function in list context.
- print "The time values are @{ [localtime] }.\n";
+ print "The time values are @{ [localtime] }.\n";
If we want to call the function in scalar context, we have to do a bit
more work. We can really have any code we like inside the braces, so
the use of parens creates a list context, so we need C<scalar> to
force the scalar context on the function:
- print "The time is ${\(scalar localtime)}.\n"
+ print "The time is ${\(scalar localtime)}.\n"
- print "The time is ${ my $x = localtime; \$x }.\n";
+ print "The time is ${ my $x = localtime; \$x }.\n";
If your function already returns a reference, you don't need to create
the reference yourself.
- sub timestamp { my $t = localtime; \$t }
+ sub timestamp { my $t = localtime; \$t }
- print "The time is ${ timestamp() }.\n";
+ print "The time is ${ timestamp() }.\n";
The C<Interpolation> module can also do a lot of magic for you. You can
specify a variable name, in this case C<E>, to set up a tied hash that
does the interpolation for you. It has several other methods to do this
as well.
- use Interpolation E => 'eval';
- print "The time values are $E{localtime()}.\n";
+ use Interpolation E => 'eval';
+ print "The time values are $E{localtime()}.\n";
In most cases, it is probably easier to simply use string concatenation,
which also forces scalar context.
- print "The time is " . localtime() . ".\n";
+ print "The time is " . localtime() . ".\n";
=head2 How do I find matching/nesting anything?
If you are serious about writing a parser, there are a number of
modules or oddities that will make your life a lot easier. There are
-the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
-C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
-the C<Text::Balanced> is part of the standard distribution.
+the CPAN modules L<Parse::RecDescent>, L<Parse::Yapp>, and
+L<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
+the L<Text::Balanced> is part of the standard distribution.
One simple destructive, inside-out approach that you might try is to
pull out the smallest nesting parts one at a time:
- while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
- # do something with $1
- }
+ while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
+ # do something with $1
+ }
A more complicated and sneaky approach is to make Perl's regular
expression engine do it for you. This is courtesy Dean Inada, and
rather has the nature of an Obfuscated Perl Contest entry, but it
really does work:
- # $_ contains the string to parse
- # BEGIN and END are the opening and closing markers for the
- # nested text.
+ # $_ contains the string to parse
+ # BEGIN and END are the opening and closing markers for the
+ # nested text.
- @( = ('(','');
- @) = (')','');
- ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
- @$ = (eval{/$re/},$@!~/unmatched/i);
- print join("\n",@$[0..$#$]) if( $$[-1] );
+ @( = ('(','');
+ @) = (')','');
+ ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
+ @$ = (eval{/$re/},$@!~/unmatched/i);
+ print join("\n",@$[0..$#$]) if( $$[-1] );
=head2 How do I reverse a string?
Use C<reverse()> in scalar context, as documented in
L<perlfunc/reverse>.
- $reversed = reverse $string;
+ $reversed = reverse $string;
=head2 How do I expand tabs in a string?
You can do it yourself:
- 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
+ 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
-Or you can just use the C<Text::Tabs> module (part of the standard Perl
+Or you can just use the L<Text::Tabs> module (part of the standard Perl
distribution).
- use Text::Tabs;
- @expanded_lines = expand(@lines_with_tabs);
+ use Text::Tabs;
+ @expanded_lines = expand(@lines_with_tabs);
=head2 How do I reformat a paragraph?
-Use C<Text::Wrap> (part of the standard Perl distribution):
+Use L<Text::Wrap> (part of the standard Perl distribution):
- use Text::Wrap;
- print wrap("\t", ' ', @paragraphs);
+ use Text::Wrap;
+ print wrap("\t", ' ', @paragraphs);
-The paragraphs you give to C<Text::Wrap> should not contain embedded
-newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
+The paragraphs you give to L<Text::Wrap> should not contain embedded
+newlines. L<Text::Wrap> doesn't justify the lines (flush-right).
-Or use the CPAN module C<Text::Autoformat>. Formatting files can be
+Or use the CPAN module L<Text::Autoformat>. Formatting files can be
easily done by making a shell alias, like so:
- alias fmt="perl -i -MText::Autoformat -n0777 \
- -e 'print autoformat $_, {all=>1}' $*"
+ alias fmt="perl -i -MText::Autoformat -n0777 \
+ -e 'print autoformat $_, {all=>1}' $*"
-See the documentation for C<Text::Autoformat> to appreciate its many
+See the documentation for L<Text::Autoformat> to appreciate its many
capabilities.
=head2 How can I access or change N characters of a string?
and grab the string of length 1.
- $string = "Just another Perl Hacker";
- $first_char = substr( $string, 0, 1 ); # 'J'
+ $string = "Just another Perl Hacker";
+ $first_char = substr( $string, 0, 1 ); # 'J'
To change part of a string, you can use the optional fourth
argument which is the replacement string.
- substr( $string, 13, 4, "Perl 5.8.0" );
+ substr( $string, 13, 4, "Perl 5.8.0" );
You can also use substr() as an lvalue.
- substr( $string, 13, 4 ) = "Perl 5.8.0";
+ substr( $string, 13, 4 ) = "Perl 5.8.0";
=head2 How do I change the Nth occurrence of something?
C<"whosoever"> or C<"whomsoever">, case insensitively. These
all assume that $_ contains the string to be altered.
- $count = 0;
- s{((whom?)ever)}{
- ++$count == 5 # is it the 5th?
- ? "${2}soever" # yes, swap
- : $1 # renege and leave it there
- }ige;
+ $count = 0;
+ s{((whom?)ever)}{
+ ++$count == 5 # is it the 5th?
+ ? "${2}soever" # yes, swap
+ : $1 # renege and leave it there
+ }ige;
In the more general case, you can use the C</g> modifier in a C<while>
loop, keeping count of matches.
- $WANT = 3;
- $count = 0;
- $_ = "One fish two fish red fish blue fish";
- while (/(\w+)\s+fish\b/gi) {
- if (++$count == $WANT) {
- print "The third fish is a $1 one.\n";
- }
- }
+ $WANT = 3;
+ $count = 0;
+ $_ = "One fish two fish red fish blue fish";
+ while (/(\w+)\s+fish\b/gi) {
+ if (++$count == $WANT) {
+ print "The third fish is a $1 one.\n";
+ }
+ }
That prints out: C<"The third fish is a red one."> You can also use a
repetition count and repeated pattern like this:
- /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
+ /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
=head2 How can I count the number of occurrences of a substring within a string?
count of a certain single character (X) within a string, you can use the
C<tr///> function like so:
- $string = "ThisXlineXhasXsomeXx'sXinXit";
- $count = ($string =~ tr/X//);
- print "There are $count X characters in the string";
+ $string = "ThisXlineXhasXsomeXx'sXinXit";
+ $count = ($string =~ tr/X//);
+ print "There are $count X characters in the string";
This is fine if you are just looking for a single character. However,
if you are trying to count multiple character substrings within a
loop around a global pattern match. For example, let's count negative
integers:
- $string = "-9 55 48 -2 23 -76 4 14 -44";
- while ($string =~ /-\d+/g) { $count++ }
- print "There are $count negative numbers in the string";
+ $string = "-9 55 48 -2 23 -76 4 14 -44";
+ while ($string =~ /-\d+/g) { $count++ }
+ print "There are $count negative numbers in the string";
Another version uses a global match in list context, then assigns the
result to a scalar, producing a count of the number of matches.
- $count = () = $string =~ /-\d+/g;
+ $count = () = $string =~ /-\d+/g;
=head2 How do I capitalize all the words on one line?
X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
Damian Conway's L<Text::Autoformat> handles all of the thinking
for you.
- use Text::Autoformat;
- my $x = "Dr. Strangelove or: How I Learned to Stop ".
- "Worrying and Love the Bomb";
+ use Text::Autoformat;
+ my $x = "Dr. Strangelove or: How I Learned to Stop ".
+ "Worrying and Love the Bomb";
- print $x, "\n";
- for my $style (qw( sentence title highlight )) {
- print autoformat($x, { case => $style }), "\n";
- }
+ print $x, "\n";
+ for my $style (qw( sentence title highlight )) {
+ print autoformat($x, { case => $style }), "\n";
+ }
How do you want to capitalize those words?
- FRED AND BARNEY'S LODGE # all uppercase
- Fred And Barney's Lodge # title case
- Fred and Barney's Lodge # highlight case
+ FRED AND BARNEY'S LODGE # all uppercase
+ Fred And Barney's Lodge # title case
+ Fred and Barney's Lodge # highlight case
It's not as easy a problem as it looks. How many words do you think
are in there? Wait for it... wait for it.... If you answered 5
you want to capitalize. How is Perl supposed to know not to capitalize
that C<s> after the apostrophe? You could try a regular expression:
- $string =~ s/ (
- (^\w) #at the beginning of the line
- | # or
- (\s\w) #preceded by whitespace
- )
- /\U$1/xg;
+ $string =~ s/ (
+ (^\w) #at the beginning of the line
+ | # or
+ (\s\w) #preceded by whitespace
+ )
+ /\U$1/xg;
- $string =~ s/([\w']+)/\u\L$1/g;
+ $string =~ s/([\w']+)/\u\L$1/g;
Now, what if you don't want to capitalize that "and"? Just use
L<Text::Autoformat> and get on with the next problem. :)
=head2 How can I split a [character]-delimited string except when inside [character]?
-Several modules can handle this sort of parsing--C<Text::Balanced>,
-C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
+Several modules can handle this sort of parsing--L<Text::Balanced>,
+L<Text::CSV>, L<Text::CSV_XS>, and L<Text::ParseWords>, among others.
Take the example case of trying to split a string that is
comma-separated into its different fields. You can't use C<split(/,/)>
because you shouldn't split if the comma is inside quotes. For
example, take a data line like this:
- SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
+ SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex
problem. Thankfully, we have Jeffrey Friedl, author of
I<Mastering Regular Expressions>, to handle these for us. He
suggests (assuming your string is contained in C<$text>):
- @new = ();
- push(@new, $+) while $text =~ m{
- "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
- | ([^,]+),?
- | ,
- }gx;
- push(@new, undef) if substr($text,-1,1) eq ',';
+ @new = ();
+ push(@new, $+) while $text =~ m{
+ "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
+ | ([^,]+),?
+ | ,
+ }gx;
+ push(@new, undef) if substr($text,-1,1) eq ',';
If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
C<"like \"this\"">.
-Alternatively, the C<Text::ParseWords> module (part of the standard
+Alternatively, the L<Text::ParseWords> module (part of the standard
Perl distribution) lets you say:
- use Text::ParseWords;
- @new = quotewords(",", 0, $text);
+ use Text::ParseWords;
+ @new = quotewords(",", 0, $text);
=head2 How do I strip blank space from the beginning/end of a string?
replace all the leading or trailing whitespace with nothing. You
can do that with a pair of substitutions:
- s/^\s+//;
- s/\s+$//;
+ s/^\s+//;
+ s/\s+$//;
You can also write that as a single substitution, although it turns
out the combined statement is slower than the separate ones. That
might not matter to you, though:
- s/^\s+|\s+$//g;
+ s/^\s+|\s+$//g;
In this regular expression, the alternation matches either at the
beginning or the end of the string since the anchors have a lower
"blank" (consisting entirely of whitespace) lines which the C<^\s+>
would remove all by itself:
- while( <> ) {
- s/^\s+|\s+$//g;
- print "$_\n";
- }
+ while( <> ) {
+ s/^\s+|\s+$//g;
+ print "$_\n";
+ }
For a multi-line string, you can apply the regular expression to each
logical line in the string by adding the C</m> flag (for
embedded newline, so it doesn't remove it. This pattern still removes
the newline at the end of the string:
- $string =~ s/^\s+|\s+$//gm;
+ $string =~ s/^\s+|\s+$//gm;
Remember that lines consisting entirely of whitespace will disappear,
since the first part of the alternation can match the entire string
you have to do a little more work. Instead of matching any whitespace
(since that includes a newline), just match the other whitespace:
- $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
+ $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
=head2 How do I pad a string with blanks or pad a number with zeroes?
right with blanks and it will truncate the result to a maximum length of
C<$pad_len>.
- # Left padding a string with blanks (no truncation):
- $padded = sprintf("%${pad_len}s", $text);
- $padded = sprintf("%*s", $pad_len, $text); # same thing
+ # Left padding a string with blanks (no truncation):
+ $padded = sprintf("%${pad_len}s", $text);
+ $padded = sprintf("%*s", $pad_len, $text); # same thing
- # Right padding a string with blanks (no truncation):
- $padded = sprintf("%-${pad_len}s", $text);
- $padded = sprintf("%-*s", $pad_len, $text); # same thing
+ # Right padding a string with blanks (no truncation):
+ $padded = sprintf("%-${pad_len}s", $text);
+ $padded = sprintf("%-*s", $pad_len, $text); # same thing
- # Left padding a number with 0 (no truncation):
- $padded = sprintf("%0${pad_len}d", $num);
- $padded = sprintf("%0*d", $pad_len, $num); # same thing
+ # Left padding a number with 0 (no truncation):
+ $padded = sprintf("%0${pad_len}d", $num);
+ $padded = sprintf("%0*d", $pad_len, $num); # same thing
- # Right padding a string with blanks using pack (will truncate):
- $padded = pack("A$pad_len",$text);
+ # Right padding a string with blanks using pack (will truncate):
+ $padded = pack("A$pad_len",$text);
If you need to pad with a character other than blank or zero you can use
one of the following methods. They all generate a pad string with the
Left and right padding with any character, creating a new string:
- $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
- $padded = $text . $pad_char x ( $pad_len - length( $text ) );
+ $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
+ $padded = $text . $pad_char x ( $pad_len - length( $text ) );
Left and right padding with any character, modifying C<$text> directly:
- substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
- $text .= $pad_char x ( $pad_len - length( $text ) );
+ substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
+ $text .= $pad_char x ( $pad_len - length( $text ) );
=head2 How do I extract selected columns from a string?
If you know the columns that contain the data, you can
use C<substr> to extract a single column.
- my $column = substr( $line, $start_column, $length );
+ my $column = substr( $line, $start_column, $length );
You can use C<split> if the columns are separated by whitespace or
some other delimiter, as long as whitespace or the delimiter cannot
appear as part of the data.
- my $line = ' fred barney betty ';
- my @columns = split /\s+/, $line;
- # ( '', 'fred', 'barney', 'betty' );
+ my $line = ' fred barney betty ';
+ my @columns = split /\s+/, $line;
+ # ( '', 'fred', 'barney', 'betty' );
- my $line = 'fred||barney||betty';
- my @columns = split /\|/, $line;
- # ( 'fred', '', 'barney', '', 'betty' );
+ my $line = 'fred||barney||betty';
+ my @columns = split /\|/, $line;
+ # ( 'fred', '', 'barney', '', 'betty' );
If you want to work with comma-separated values, don't do this since
that format is a bit more complicated. Use one of the modules that
-handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
-C<Text::CSV_PP>.
+handle that format, such as L<Text::CSV>, L<Text::CSV_XS>, or
+L<Text::CSV_PP>.
If you want to break apart an entire line of fixed columns, you can use
C<unpack> with the A (ASCII) format. By using a number after the format
specifier, you can denote the column width. See the C<pack> and C<unpack>
entries in L<perlfunc> for more details.
- my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
+ my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
Note that spaces in the format argument to C<unpack> do not denote literal
spaces. If you have space separated data, you may want C<split> instead.
(contributed by brian d foy)
-You can use the Text::Soundex module. If you want to do fuzzy or close
-matching, you might also try the C<String::Approx>, and
-C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
+You can use the C<Text::Soundex> module. If you want to do fuzzy or close
+matching, you might also try the L<String::Approx>, and
+L<Text::Metaphone>, and L<Text::DoubleMetaphone> modules.
=head2 How can I expand variables in text strings?
(contributed by brian d foy)
If you can avoid it, don't, or if you can use a templating system,
-such as C<Text::Template> or C<Template> Toolkit, do that instead. You
+such as L<Text::Template> or L<Template> Toolkit, do that instead. You
might even be able to get the job done with C<sprintf> or C<printf>:
- my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
+ my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
However, for the one-off simple case where I don't want to pull out a
full templating system, I'll use a string that has two Perl scalar
variables in it. In this example, I want to expand C<$foo> and C<$bar>
to their variable's values:
- my $foo = 'Fred';
- my $bar = 'Barney';
- $string = 'Say hello to $foo and $bar';
+ my $foo = 'Fred';
+ my $bar = 'Barney';
+ $string = 'Say hello to $foo and $bar';
One way I can do this involves the substitution operator and a double
C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
it with its value. C<$foo>, then, turns into 'Fred', and that's finally
what's left in the string:
- $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
+ $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
The C</e> will also silently ignore violations of strict, replacing
undefined variable names with the empty string. Since I'm using the
can replace the missing value with a marker, in this case C<???> to
signal that I missed something:
- my $string = 'This has $foo and $bar';
+ my $string = 'This has $foo and $bar';
- my %Replacements = (
- foo => 'Fred',
- );
+ my %Replacements = (
+ foo => 'Fred',
+ );
- # $string =~ s/\$(\w+)/$Replacements{$1}/g;
- $string =~ s/\$(\w+)/
- exists $Replacements{$1} ? $Replacements{$1} : '???'
- /eg;
+ # $string =~ s/\$(\w+)/$Replacements{$1}/g;
+ $string =~ s/\$(\w+)/
+ exists $Replacements{$1} ? $Replacements{$1} : '???'
+ /eg;
- print $string;
+ print $string;
=head2 What's wrong with always quoting "$vars"?
If you get used to writing odd things like these:
- print "$var"; # BAD
- $new = "$old"; # BAD
- somefunc("$var"); # BAD
+ print "$var"; # BAD
+ $new = "$old"; # BAD
+ somefunc("$var"); # BAD
You'll be in trouble. Those should (in 99.8% of the cases) be
the simpler and more direct:
- print $var;
- $new = $old;
- somefunc($var);
+ print $var;
+ $new = $old;
+ somefunc($var);
Otherwise, besides slowing you down, you're going to break code when
the thing in the scalar is actually neither a string nor a number, but
a reference:
- func(\@array);
- sub func {
- my $aref = shift;
- my $oref = "$aref"; # WRONG
- }
+ func(\@array);
+ sub func {
+ my $aref = shift;
+ my $oref = "$aref"; # WRONG
+ }
You can also get into subtle problems on those few operations in Perl
that actually do care about the difference between a string and a
Stringification also destroys arrays.
- @lines = `command`;
- print "@lines"; # WRONG - extra blanks
- print @lines; # right
+ @lines = `command`;
+ print "@lines"; # WRONG - extra blanks
+ print @lines; # right
=head2 Why don't my E<lt>E<lt>HERE documents work?
This works with leading special strings, dynamically determined:
- $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
- @@@ int
- @@@ runops() {
- @@@ SAVEI32(runlevel);
- @@@ runlevel++;
- @@@ while ( op = (*op->op_ppaddr)() );
- @@@ TAINT_NOT;
- @@@ return 0;
- @@@ }
- MAIN_INTERPRETER_LOOP
+ $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
+ @@@ int
+ @@@ runops() {
+ @@@ SAVEI32(runlevel);
+ @@@ runlevel++;
+ @@@ while ( op = (*op->op_ppaddr)() );
+ @@@ TAINT_NOT;
+ @@@ return 0;
+ @@@ }
+ MAIN_INTERPRETER_LOOP
Or with a fixed amount of leading whitespace, with remaining
indentation correctly preserved:
- $poem = fix<<EVER_ON_AND_ON;
+ $poem = fix<<EVER_ON_AND_ON;
Now far ahead the Road has gone,
- And I must follow, if I can,
+ And I must follow, if I can,
Pursuing it with eager feet,
- Until it joins some larger way
+ Until it joins some larger way
Where many paths and errands meet.
- And whither then? I cannot say.
- --Bilbo in /usr/src/perl/pp_ctl.c
- EVER_ON_AND_ON
+ And whither then? I cannot say.
+ --Bilbo in /usr/src/perl/pp_ctl.c
+ EVER_ON_AND_ON
=head1 Data: Arrays
holds a variable collection of scalars. An array can supply its collection
for list operations, so list operations also work on arrays:
- # slices
- ( 'dog', 'cat', 'bird' )[2,3];
- @animals[2,3];
+ # slices
+ ( 'dog', 'cat', 'bird' )[2,3];
+ @animals[2,3];
- # iteration
- foreach ( qw( dog cat bird ) ) { ... }
- foreach ( @animals ) { ... }
+ # iteration
+ foreach ( qw( dog cat bird ) ) { ... }
+ foreach ( @animals ) { ... }
- my @three = grep { length == 3 } qw( dog cat bird );
- my @three = grep { length == 3 } @animals;
+ my @three = grep { length == 3 } qw( dog cat bird );
+ my @three = grep { length == 3 } @animals;
- # supply an argument list
- wash_animals( qw( dog cat bird ) );
- wash_animals( @animals );
+ # supply an argument list
+ wash_animals( qw( dog cat bird ) );
+ wash_animals( @animals );
Array operations, which change the scalars, rearranges them, or adds
or subtracts some scalars, only work on arrays. These can't work on a
An array can also change its length:
- $#animals = 1; # truncate to two elements
- $#animals = 10000; # pre-extend to 10,001 elements
+ $#animals = 1; # truncate to two elements
+ $#animals = 10000; # pre-extend to 10,001 elements
You can change an array element, but you can't change a list element:
- $animals[0] = 'Rottweiler';
- qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
+ $animals[0] = 'Rottweiler';
+ qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
- foreach ( @animals ) {
- s/^d/fr/; # works fine
- }
+ foreach ( @animals ) {
+ s/^d/fr/; # works fine
+ }
- foreach ( qw( dog cat bird ) ) {
- s/^d/fr/; # Error! Modification of read only value!
- }
+ foreach ( qw( dog cat bird ) ) {
+ s/^d/fr/; # Error! Modification of read only value!
+ }
However, if the list element is itself a variable, it appears that you
can change a list element. However, the list element is the variable, not
a scalar to get the number of elements in the array. This only works
for arrays, though:
- my $count = @animals; # only works with arrays
+ my $count = @animals; # only works with arrays
If you try to do the same thing with what you think is a list, you
get a quite different result. Although it looks like you have a list
on the righthand side, Perl actually sees a bunch of scalars separated
by a comma:
- my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
+ my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
Since you're assigning to a scalar, the righthand side is in scalar
context. The comma operator (yes, it's an operator!) in scalar
people mess this up because they choose a list-lookalike whose
last element is also the count they expect:
- my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
+ my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
=head2 What is the difference between $array[1] and @array[1]?
For instance, if you want to read a single line from a filehandle,
assigning to a scalar value is fine:
- $array[1] = <STDIN>;
+ $array[1] = <STDIN>;
However, in list context, the line input operator returns all of the
lines as a list. The first line goes into C<@array[1]> and the rest
of the lines mysteriously disappear:
- @array[1] = <STDIN>; # most likely not what you want
+ @array[1] = <STDIN>; # most likely not what you want
Either the C<use warnings> pragma or the B<-w> flag will warn you when
you use an array slice with a single index.
create that hash: just that you use C<keys> to get the unique
elements.
- my %hash = map { $_, 1 } @array;
- # or a hash slice: @hash{ @array } = ();
- # or a foreach: $hash{$_} = 1 foreach ( @array );
+ my %hash = map { $_, 1 } @array;
+ # or a hash slice: @hash{ @array } = ();
+ # or a foreach: $hash{$_} = 1 foreach ( @array );
- my @unique = keys %hash;
+ my @unique = keys %hash;
If you want to use a module, try the C<uniq> function from
-C<List::MoreUtils>. In list context it returns the unique elements,
+L<List::MoreUtils>. In list context it returns the unique elements,
preserving their order in the list. In scalar context, it returns the
number of unique elements.
- use List::MoreUtils qw(uniq);
+ use List::MoreUtils qw(uniq);
- my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
- my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
+ my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
+ my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
You can also go through each element and skip the ones you've seen
before. Use a hash to keep track. The first time the loop sees an
C<undef>), so the next skips that iteration and the loop goes to the
next element.
- my @unique = ();
- my %seen = ();
+ my @unique = ();
+ my %seen = ();
- foreach my $elem ( @array )
- {
- next if $seen{ $elem }++;
- push @unique, $elem;
- }
+ foreach my $elem ( @array ) {
+ next if $seen{ $elem }++;
+ push @unique, $elem;
+ }
You can write this more briefly using a grep, which does the
same thing.
- my %seen = ();
- my @unique = grep { ! $seen{ $_ }++ } @array;
+ my %seen = ();
+ my @unique = grep { ! $seen{ $_ }++ } @array;
=head2 How can I tell whether a certain element is contained in a list or array?
and later, you can use the smart match operator to check that an item is
contained in an array or a hash:
- use 5.010;
+ use 5.010;
- if( $item ~~ @array )
- {
- say "The array contains $item"
- }
+ if( $item ~~ @array ) {
+ say "The array contains $item"
+ }
- if( $item ~~ %hash )
- {
- say "The hash contains $item"
- }
+ if( $item ~~ %hash ) {
+ say "The hash contains $item"
+ }
With earlier versions of Perl, you have to do a bit more work. If you
are going to make this query many times over arbitrary string values,
the fastest way is probably to invert the original array and maintain a
hash whose keys are the first array's values:
- @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
- %is_blue = ();
- for (@blues) { $is_blue{$_} = 1 }
+ @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
+ %is_blue = ();
+ for (@blues) { $is_blue{$_} = 1 }
Now you can check whether C<$is_blue{$some_color}>. It might have
been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed
array. This kind of an array will take up less space:
- @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
- @is_tiny_prime = ();
- for (@primes) { $is_tiny_prime[$_] = 1 }
- # or simply @istiny_prime[@primes] = (1) x @primes;
+ @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
+ @is_tiny_prime = ();
+ for (@primes) { $is_tiny_prime[$_] = 1 }
+ # or simply @istiny_prime[@primes] = (1) x @primes;
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save
quite a lot of space by using bit strings instead:
- @articles = ( 1..10, 150..2000, 2017 );
- undef $read;
- for (@articles) { vec($read,$_,1) = 1 }
+ @articles = ( 1..10, 150..2000, 2017 );
+ undef $read;
+ for (@articles) { vec($read,$_,1) = 1 }
Now check whether C<vec($read,$n,1)> is true for some C<$n>.
of the original list or array. They only pay off if you have to test
multiple values against the same array.
-If you are testing only once, the standard module C<List::Util> exports
+If you are testing only once, the standard module L<List::Util> exports
the function C<first> for this purpose. It works by stopping once it
finds the element. It's written in C for speed, and its Perl equivalent
looks like this subroutine:
- sub first (&@) {
- my $code = shift;
- foreach (@_) {
- return $_ if &{$code}();
- }
- undef;
- }
+ sub first (&@) {
+ my $code = shift;
+ foreach (@_) {
+ return $_ if &{$code}();
+ }
+ undef;
+ }
If speed is of little concern, the common idiom uses grep in scalar context
(which returns the number of items that passed its condition) to traverse the
entire list. This does have the benefit of telling you how many matches it
found, though.
- my $is_there = grep $_ eq $whatever, @array;
+ my $is_there = grep $_ eq $whatever, @array;
If you want to actually extract the matching elements, simply use grep in
list context.
- my @matches = grep $_ eq $whatever, @array;
+ my @matches = grep $_ eq $whatever, @array;
=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each
element is unique in a given array:
- @union = @intersection = @difference = ();
- %count = ();
- foreach $element (@array1, @array2) { $count{$element}++ }
- foreach $element (keys %count) {
- push @union, $element;
- push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
- }
+ @union = @intersection = @difference = ();
+ %count = ();
+ foreach $element (@array1, @array2) { $count{$element}++ }
+ foreach $element (keys %count) {
+ push @union, $element;
+ push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
+ }
Note that this is the I<symmetric difference>, that is, all elements
in either A or in B but not in both. Think of it as an xor operation.
With Perl 5.10 and later, the smart match operator can give you the answer
with the least amount of work:
- use 5.010;
+ use 5.010;
- if( @array1 ~~ @array2 )
- {
- say "The arrays are the same";
- }
+ if( @array1 ~~ @array2 ) {
+ say "The arrays are the same";
+ }
- if( %hash1 ~~ %hash2 ) # doesn't check values!
- {
- say "The hash keys are the same";
- }
+ if( %hash1 ~~ %hash2 ) # doesn't check values! {
+ say "The hash keys are the same";
+ }
The following code works for single-level arrays. It uses a
stringwise comparison, and does not distinguish defined versus
undefined empty strings. Modify if you have other needs.
- $are_equal = compare_arrays(\@frogs, \@toads);
+ $are_equal = compare_arrays(\@frogs, \@toads);
- sub compare_arrays {
- my ($first, $second) = @_;
- no warnings; # silence spurious -w undef complaints
- return 0 unless @$first == @$second;
- for (my $i = 0; $i < @$first; $i++) {
- return 0 if $first->[$i] ne $second->[$i];
- }
- return 1;
- }
+ sub compare_arrays {
+ my ($first, $second) = @_;
+ no warnings; # silence spurious -w undef complaints
+ return 0 unless @$first == @$second;
+ for (my $i = 0; $i < @$first; $i++) {
+ return 0 if $first->[$i] ne $second->[$i];
+ }
+ return 1;
+ }
For multilevel structures, you may wish to use an approach more
-like this one. It uses the CPAN module C<FreezeThaw>:
+like this one. It uses the CPAN module L<FreezeThaw>:
- use FreezeThaw qw(cmpStr);
- @a = @b = ( "this", "that", [ "more", "stuff" ] );
+ use FreezeThaw qw(cmpStr);
+ @a = @b = ( "this", "that", [ "more", "stuff" ] );
- printf "a and b contain %s arrays\n",
- cmpStr(\@a, \@b) == 0
- ? "the same"
- : "different";
+ printf "a and b contain %s arrays\n",
+ cmpStr(\@a, \@b) == 0
+ ? "the same"
+ : "different";
This approach also works for comparing hashes. Here we'll demonstrate
two different answers:
- use FreezeThaw qw(cmpStr cmpStrHard);
+ use FreezeThaw qw(cmpStr cmpStrHard);
- %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
- $a{EXTRA} = \%b;
- $b{EXTRA} = \%a;
+ %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
+ $a{EXTRA} = \%b;
+ $b{EXTRA} = \%a;
- printf "a and b contain %s hashes\n",
- cmpStr(\%a, \%b) == 0 ? "the same" : "different";
+ printf "a and b contain %s hashes\n",
+ cmpStr(\%a, \%b) == 0 ? "the same" : "different";
- printf "a and b contain %s hashes\n",
- cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
+ printf "a and b contain %s hashes\n",
+ cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
The first reports that both those the hashes contain the same data,
=head2 How do I find the first array element for which a condition is true?
To find the first array element which satisfies a condition, you can
-use the C<first()> function in the C<List::Util> module, which comes
+use the C<first()> function in the L<List::Util> module, which comes
with Perl 5.8. This example finds the first element that contains
"Perl".
- use List::Util qw(first);
+ use List::Util qw(first);
- my $element = first { /Perl/ } @array;
+ my $element = first { /Perl/ } @array;
-If you cannot use C<List::Util>, you can make your own loop to do the
+If you cannot use L<List::Util>, you can make your own loop to do the
same thing. Once you find the element, you stop the loop with last.
- my $found;
- foreach ( @array ) {
- if( /Perl/ ) { $found = $_; last }
- }
+ my $found;
+ foreach ( @array ) {
+ if( /Perl/ ) { $found = $_; last }
+ }
If you want the array index, you can iterate through the indices
and check the array element at each index until you find one
that satisfies the condition.
- my( $found, $index ) = ( undef, -1 );
- for( $i = 0; $i < @array; $i++ ) {
- if( $array[$i] =~ /Perl/ ) {
- $found = $array[$i];
- $index = $i;
- last;
- }
- }
+ my( $found, $index ) = ( undef, -1 );
+ for( $i = 0; $i < @array; $i++ ) {
+ if( $array[$i] =~ /Perl/ ) {
+ $found = $array[$i];
+ $index = $i;
+ last;
+ }
+ }
=head2 How do I handle linked lists?
Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
Linked Lists" ( http://www.slideshare.net/lembark/perly-linked-lists ),
-although you can just use his C<LinkedList::Single> module.
+although you can just use his L<LinkedList::Single> module.
=head2 How do I handle circular lists?
X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
If you want to cycle through an array endlessly, you can increment the
index modulo the number of elements in the array:
- my @array = qw( a b c );
- my $i = 0;
+ my @array = qw( a b c );
+ my $i = 0;
- while( 1 ) {
- print $array[ $i++ % @array ], "\n";
- last if $i > 20;
- }
+ while( 1 ) {
+ print $array[ $i++ % @array ], "\n";
+ last if $i > 20;
+ }
-You can also use C<Tie::Cycle> to use a scalar that always has the
+You can also use L<Tie::Cycle> to use a scalar that always has the
next element of the circular array:
- use Tie::Cycle;
+ use Tie::Cycle;
- tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
+ tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
- print $cycle; # FFFFFF
- print $cycle; # 000000
- print $cycle; # FFFF00
+ print $cycle; # FFFFFF
+ print $cycle; # 000000
+ print $cycle; # FFFF00
-The C<Array::Iterator::Circular> creates an iterator object for
+The L<Array::Iterator::Circular> creates an iterator object for
circular arrays:
- use Array::Iterator::Circular;
+ use Array::Iterator::Circular;
- my $color_iterator = Array::Iterator::Circular->new(
- qw(red green blue orange)
- );
+ my $color_iterator = Array::Iterator::Circular->new(
+ qw(red green blue orange)
+ );
- foreach ( 1 .. 20 ) {
- print $color_iterator->next, "\n";
- }
+ foreach ( 1 .. 20 ) {
+ print $color_iterator->next, "\n";
+ }
=head2 How do I shuffle an array randomly?
If you either have Perl 5.8.0 or later installed, or if you have
Scalar-List-Utils 1.03 or later installed, you can say:
- use List::Util 'shuffle';
+ use List::Util 'shuffle';
- @shuffled = shuffle(@list);
+ @shuffled = shuffle(@list);
If not, you can use a Fisher-Yates shuffle.
- sub fisher_yates_shuffle {
- my $deck = shift; # $deck is a reference to an array
- return unless @$deck; # must not be empty!
+ sub fisher_yates_shuffle {
+ my $deck = shift; # $deck is a reference to an array
+ return unless @$deck; # must not be empty!
- my $i = @$deck;
- while (--$i) {
- my $j = int rand ($i+1);
- @$deck[$i,$j] = @$deck[$j,$i];
- }
- }
+ my $i = @$deck;
+ while (--$i) {
+ my $j = int rand ($i+1);
+ @$deck[$i,$j] = @$deck[$j,$i];
+ }
+ }
- # shuffle my mpeg collection
- #
- my @mpeg = <audio/*/*.mp3>;
- fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
- print @mpeg;
+ # shuffle my mpeg collection
+ #
+ my @mpeg = <audio/*/*.mp3>;
+ fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
+ print @mpeg;
Note that the above implementation shuffles an array in place,
unlike the C<List::Util::shuffle()> which takes a list and returns
You've probably seen shuffling algorithms that work using splice,
randomly picking another element to swap the current element with
- srand;
- @new = ();
- @old = 1 .. 10; # just a demo
- while (@old) {
- push(@new, splice(@old, rand @old, 1));
- }
+ srand;
+ @new = ();
+ @old = 1 .. 10; # just a demo
+ while (@old) {
+ push(@new, splice(@old, rand @old, 1));
+ }
This is bad because splice is already O(N), and since you do it N
times, you just invented a quadratic algorithm; that is, O(N**2).
Use C<for>/C<foreach>:
- for (@lines) {
- s/foo/bar/; # change that word
- tr/XZ/ZX/; # swap those letters
- }
+ for (@lines) {
+ s/foo/bar/; # change that word
+ tr/XZ/ZX/; # swap those letters
+ }
Here's another; let's compute spherical volumes:
- for (@volumes = @radii) { # @volumes has changed parts
- $_ **= 3;
- $_ *= (4/3) * 3.14159; # this will be constant folded
- }
+ for (@volumes = @radii) { # @volumes has changed parts
+ $_ **= 3;
+ $_ *= (4/3) * 3.14159; # this will be constant folded
+ }
which can also be done with C<map()> which is made to transform
one list into another:
- @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
+ @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
If you want to do the same thing to modify the values of the
hash, you can use the C<values> function. As of Perl 5.6
the values are not copied, so if you modify $orbit (in this
case), you modify the value.
- for $orbit ( values %orbits ) {
- ($orbit **= 3) *= (4/3) * 3.14159;
- }
+ for $orbit ( values %orbits ) {
+ ($orbit **= 3) *= (4/3) * 3.14159;
+ }
Prior to perl 5.6 C<values> returned copies of the values,
so older perl code often contains constructions such as
Use the C<rand()> function (see L<perlfunc/rand>):
- $index = rand @array;
- $element = $array[$index];
+ $index = rand @array;
+ $element = $array[$index];
Or, simply:
- my $element = $array[ rand @array ];
+ my $element = $array[ rand @array ];
=head2 How do I permute N elements of a list?
X<List::Permutor> X<permute> X<Algorithm::Loops> X<Knuth>
X<The Art of Computer Programming> X<Fischer-Krause>
-Use the C<List::Permutor> module on CPAN. If the list is actually an
-array, try the C<Algorithm::Permute> module (also on CPAN). It's
+Use the L<List::Permutor> module on CPAN. If the list is actually an
+array, try the L<Algorithm::Permute> module (also on CPAN). It's
written in XS code and is very efficient:
- use Algorithm::Permute;
+ use Algorithm::Permute;
- my @array = 'a'..'d';
- my $p_iterator = Algorithm::Permute->new ( \@array );
+ my @array = 'a'..'d';
+ my $p_iterator = Algorithm::Permute->new ( \@array );
- while (my @perm = $p_iterator->next) {
- print "next permutation: (@perm)\n";
- }
+ while (my @perm = $p_iterator->next) {
+ print "next permutation: (@perm)\n";
+ }
For even faster execution, you could do:
- use Algorithm::Permute;
+ use Algorithm::Permute;
- my @array = 'a'..'d';
+ my @array = 'a'..'d';
- Algorithm::Permute::permute {
- print "next permutation: (@array)\n";
- } @array;
+ Algorithm::Permute::permute {
+ print "next permutation: (@array)\n";
+ } @array;
Here's a little program that generates all permutations of all the
words on each line of input. The algorithm embodied in the
C<permute()> function is discussed in Volume 4 (still unpublished) of
Knuth's I<The Art of Computer Programming> and will work on any list:
- #!/usr/bin/perl -n
- # Fischer-Krause ordered permutation generator
-
- sub permute (&@) {
- my $code = shift;
- my @idx = 0..$#_;
- while ( $code->(@_[@idx]) ) {
- my $p = $#idx;
- --$p while $idx[$p-1] > $idx[$p];
- my $q = $p or return;
- push @idx, reverse splice @idx, $p;
- ++$q while $idx[$p-1] > $idx[$q];
- @idx[$p-1,$q]=@idx[$q,$p-1];
- }
- }
-
- permute { print "@_\n" } split;
-
-The C<Algorithm::Loops> module also provides the C<NextPermute> and
+ #!/usr/bin/perl -n
+ # Fischer-Krause ordered permutation generator
+
+ sub permute (&@) {
+ my $code = shift;
+ my @idx = 0..$#_;
+ while ( $code->(@_[@idx]) ) {
+ my $p = $#idx;
+ --$p while $idx[$p-1] > $idx[$p];
+ my $q = $p or return;
+ push @idx, reverse splice @idx, $p;
+ ++$q while $idx[$p-1] > $idx[$q];
+ @idx[$p-1,$q]=@idx[$q,$p-1];
+ }
+ }
+
+ permute { print "@_\n" } split;
+
+The L<Algorithm::Loops> module also provides the C<NextPermute> and
C<NextPermuteNum> functions which efficiently find all unique permutations
of an array, even if it contains duplicate values, modifying it in-place:
if its elements are in reverse-sorted order then the array is reversed,
C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
you can enumerate all the permutations of C<0..9> like this:
- use Algorithm::Loops qw(NextPermuteNum);
+ use Algorithm::Loops qw(NextPermuteNum);
my @list= 0..9;
do { print "@list\n" } while NextPermuteNum @list;
Supply a comparison function to sort() (described in L<perlfunc/sort>):
- @list = sort { $a <=> $b } @list;
+ @list = sort { $a <=> $b } @list;
The default sort function is cmp, string comparison, which would
sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
after the first number on each item, and then sort those words
case-insensitively.
- @idx = ();
- for (@data) {
- ($item) = /\d+\s*(\S+)/;
- push @idx, uc($item);
- }
- @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
+ @idx = ();
+ for (@data) {
+ ($item) = /\d+\s*(\S+)/;
+ push @idx, uc($item);
+ }
+ @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
which could also be written this way, using a trick
that's come to be known as the Schwartzian Transform:
- @sorted = map { $_->[0] }
- sort { $a->[1] cmp $b->[1] }
- map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
+ @sorted = map { $_->[0] }
+ sort { $a->[1] cmp $b->[1] }
+ map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
If you need to sort on several fields, the following paradigm is useful.
- @sorted = sort {
- field1($a) <=> field1($b) ||
- field2($a) cmp field2($b) ||
- field3($a) cmp field3($b)
- } @data;
+ @sorted = sort {
+ field1($a) <=> field1($b) ||
+ field2($a) cmp field2($b) ||
+ field3($a) cmp field3($b)
+ } @data;
This can be conveniently combined with precalculation of keys as given
above.
array of bits to a string, use C<vec()> to set the right bits. This
sets C<$vec> to have bit N set only if C<$ints[N]> was set:
- @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
- $vec = '';
- foreach( 0 .. $#ints ) {
- vec($vec,$_,1) = 1 if $ints[$_];
- }
+ @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
+ $vec = '';
+ foreach( 0 .. $#ints ) {
+ vec($vec,$_,1) = 1 if $ints[$_];
+ }
The string C<$vec> only takes up as many bits as it needs. For
instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
Here's how, given a vector in C<$vec>, you can get those bits into
your C<@ints> array:
- sub bitvec_to_list {
- my $vec = shift;
- my @ints;
- # Find null-byte density then select best algorithm
- if ($vec =~ tr/\0// / length $vec > 0.95) {
- use integer;
- my $i;
-
- # This method is faster with mostly null-bytes
- while($vec =~ /[^\0]/g ) {
- $i = -9 + 8 * pos $vec;
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- }
- }
- else {
- # This method is a fast general algorithm
- use integer;
- my $bits = unpack "b*", $vec;
- push @ints, 0 if $bits =~ s/^(\d)// && $1;
- push @ints, pos $bits while($bits =~ /1/g);
- }
-
- return \@ints;
- }
+ sub bitvec_to_list {
+ my $vec = shift;
+ my @ints;
+ # Find null-byte density then select best algorithm
+ if ($vec =~ tr/\0// / length $vec > 0.95) {
+ use integer;
+ my $i;
+
+ # This method is faster with mostly null-bytes
+ while($vec =~ /[^\0]/g ) {
+ $i = -9 + 8 * pos $vec;
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ }
+ }
+ else {
+ # This method is a fast general algorithm
+ use integer;
+ my $bits = unpack "b*", $vec;
+ push @ints, 0 if $bits =~ s/^(\d)// && $1;
+ push @ints, pos $bits while($bits =~ /1/g);
+ }
+
+ return \@ints;
+ }
This method gets faster the more sparse the bit vector is.
(Courtesy of Tim Bunce and Winfried Koenig.)
You can make the while loop a lot shorter with this suggestion
from Benjamin Goldberg:
- while($vec =~ /[^\0]+/g ) {
- push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
- }
+ while($vec =~ /[^\0]+/g ) {
+ push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
+ }
-Or use the CPAN module C<Bit::Vector>:
+Or use the CPAN module L<Bit::Vector>:
- $vector = Bit::Vector->new($num_of_bits);
- $vector->Index_List_Store(@ints);
- @ints = $vector->Index_List_Read();
+ $vector = Bit::Vector->new($num_of_bits);
+ $vector->Index_List_Store(@ints);
+ @ints = $vector->Index_List_Read();
-C<Bit::Vector> provides efficient methods for bit vector, sets of
+L<Bit::Vector> provides efficient methods for bit vector, sets of
small integers and "big int" math.
Here's a more extensive illustration using vec():
- # vec demo
- $vector = "\xff\x0f\xef\xfe";
- print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
- unpack("N", $vector), "\n";
- $is_set = vec($vector, 23, 1);
- print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
- pvec($vector);
-
- set_vec(1,1,1);
- set_vec(3,1,1);
- set_vec(23,1,1);
-
- set_vec(3,1,3);
- set_vec(3,2,3);
- set_vec(3,4,3);
- set_vec(3,4,7);
- set_vec(3,8,3);
- set_vec(3,8,7);
-
- set_vec(0,32,17);
- set_vec(1,32,17);
-
- sub set_vec {
- my ($offset, $width, $value) = @_;
- my $vector = '';
- vec($vector, $offset, $width) = $value;
- print "offset=$offset width=$width value=$value\n";
- pvec($vector);
- }
-
- sub pvec {
- my $vector = shift;
- my $bits = unpack("b*", $vector);
- my $i = 0;
- my $BASE = 8;
-
- print "vector length in bytes: ", length($vector), "\n";
- @bytes = unpack("A8" x length($vector), $bits);
- print "bits are: @bytes\n\n";
- }
+ # vec demo
+ $vector = "\xff\x0f\xef\xfe";
+ print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
+ unpack("N", $vector), "\n";
+ $is_set = vec($vector, 23, 1);
+ print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
+ pvec($vector);
+
+ set_vec(1,1,1);
+ set_vec(3,1,1);
+ set_vec(23,1,1);
+
+ set_vec(3,1,3);
+ set_vec(3,2,3);
+ set_vec(3,4,3);
+ set_vec(3,4,7);
+ set_vec(3,8,3);
+ set_vec(3,8,7);
+
+ set_vec(0,32,17);
+ set_vec(1,32,17);
+
+ sub set_vec {
+ my ($offset, $width, $value) = @_;
+ my $vector = '';
+ vec($vector, $offset, $width) = $value;
+ print "offset=$offset width=$width value=$value\n";
+ pvec($vector);
+ }
+
+ sub pvec {
+ my $vector = shift;
+ my $bits = unpack("b*", $vector);
+ my $i = 0;
+ my $BASE = 8;
+
+ print "vector length in bytes: ", length($vector), "\n";
+ @bytes = unpack("A8" x length($vector), $bits);
+ print "bits are: @bytes\n\n";
+ }
=head2 Why does defined() return true on empty arrays and hashes?
all of the keys of the hash and gives them back to you as a list. You
can then get the value through the particular key you're processing:
- foreach my $key ( keys %hash ) {
- my $value = $hash{$key}
- ...
- }
+ foreach my $key ( keys %hash ) {
+ my $value = $hash{$key}
+ ...
+ }
Once you have the list of keys, you can process that list before you
process the hash elements. For instance, you can sort the keys so you
can process them in lexical order:
- foreach my $key ( sort keys %hash ) {
- my $value = $hash{$key}
- ...
- }
+ foreach my $key ( sort keys %hash ) {
+ my $value = $hash{$key}
+ ...
+ }
Or, you might want to only process some of the items. If you only want
to deal with the keys that start with C<text:>, you can select just
those using C<grep>:
- foreach my $key ( grep /^text:/, keys %hash ) {
- my $value = $hash{$key}
- ...
- }
+ foreach my $key ( grep /^text:/, keys %hash ) {
+ my $value = $hash{$key}
+ ...
+ }
If the hash is very large, you might not want to create a long list of
keys. To save some memory, you can grab one key-value pair at a time using
C<each()>, which returns a pair you haven't seen yet:
- while( my( $key, $value ) = each( %hash ) ) {
- ...
- }
+ while( my( $key, $value ) = each( %hash ) ) {
+ ...
+ }
The C<each> operator returns the pairs in apparently random order, so if
ordering matters to you, you'll have to stick with the C<keys> method.
C<%new_hash> gives you a chance to decide what to do with the
duplicates:
- my %new_hash = %hash1; # make a copy; leave %hash1 alone
-
- foreach my $key2 ( keys %hash2 )
- {
- if( exists $new_hash{$key2} )
- {
- warn "Key [$key2] is in both hashes!";
- # handle the duplicate (perhaps only warning)
- ...
- next;
- }
- else
- {
- $new_hash{$key2} = $hash2{$key2};
- }
- }
+ my %new_hash = %hash1; # make a copy; leave %hash1 alone
+
+ foreach my $key2 ( keys %hash2 ) {
+ if( exists $new_hash{$key2} ) {
+ warn "Key [$key2] is in both hashes!";
+ # handle the duplicate (perhaps only warning)
+ ...
+ next;
+ }
+ else {
+ $new_hash{$key2} = $hash2{$key2};
+ }
+ }
If you don't want to create a new hash, you can still use this looping
technique; just change the C<%new_hash> to C<%hash1>.
- foreach my $key2 ( keys %hash2 )
- {
- if( exists $hash1{$key2} )
- {
- warn "Key [$key2] is in both hashes!";
- # handle the duplicate (perhaps only warning)
- ...
- next;
- }
- else
- {
- $hash1{$key2} = $hash2{$key2};
- }
- }
+ foreach my $key2 ( keys %hash2 ) {
+ if( exists $hash1{$key2} ) {
+ warn "Key [$key2] is in both hashes!";
+ # handle the duplicate (perhaps only warning)
+ ...
+ next;
+ }
+ else {
+ $hash1{$key2} = $hash2{$key2};
+ }
+ }
If you don't care that one hash overwrites keys and values from the other, you
could just use a hash slice to add one hash to another. In this case, values
from C<%hash2> replace values from C<%hash1> when they have keys in common:
- @hash1{ keys %hash2 } = values %hash2;
+ @hash1{ keys %hash2 } = values %hash2;
=head2 What happens if I add or remove keys from a hash while iterating over it?
Create a reverse hash:
- %by_value = reverse %by_key;
- $key = $by_value{$value};
+ %by_value = reverse %by_key;
+ $key = $by_value{$value};
That's not particularly efficient. It would be more space-efficient
to use:
- while (($key, $value) = each %by_key) {
- $by_value{$value} = $key;
- }
+ while (($key, $value) = each %by_key) {
+ $by_value{$value} = $key;
+ }
If your hash could have repeated values, the methods above will only find
one of the associated keys. This may or may not worry you. If it does
worry you, you can always reverse the hash into a hash of arrays instead:
- while (($key, $value) = each %by_key) {
- push @{$key_list_by_value{$value}}, $key;
- }
+ while (($key, $value) = each %by_key) {
+ push @{$key_list_by_value{$value}}, $key;
+ }
=head2 How can I know how many entries are in a hash?
You can use the C<keys()> built-in function in scalar context to find out
have many entries you have in a hash:
- my $key_count = keys %hash; # must be scalar context!
+ my $key_count = keys %hash; # must be scalar context!
If you want to find out how many entries have a defined value, that's
a bit different. You have to check each value. A C<grep> is handy:
- my $defined_value_count = grep { defined } values %hash;
+ my $defined_value_count = grep { defined } values %hash;
You can use that same structure to count the entries any way that
you like. If you want the count of the keys with vowels in them,
you just test for that instead:
- my $vowel_count = grep { /[aeiou]/ } keys %hash;
+ my $vowel_count = grep { /[aeiou]/ } keys %hash;
The C<grep> in scalar context returns the count. If you want the list
of matching items, just use it in list context instead:
- my @defined_values = grep { defined } values %hash;
+ my @defined_values = grep { defined } values %hash;
The C<keys()> function also resets the iterator, which means that you may
see strange results if you use this between uses of other hash operators
in ASCIIbetical order. Once we have the keys, we can go through them to
create a report which lists the keys in ASCIIbetical order.
- my @keys = sort { $a cmp $b } keys %hash;
+ my @keys = sort { $a cmp $b } keys %hash;
- foreach my $key ( @keys )
- {
- printf "%-20s %6d\n", $key, $hash{$key};
- }
+ foreach my $key ( @keys ) {
+ printf "%-20s %6d\n", $key, $hash{$key};
+ }
We could get more fancy in the C<sort()> block though. Instead of
comparing the keys, we can compute a value with them and use that
lowercase. The C<sort()> block then compares the lowercased
values to determine in which order to put the keys.
- my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
+ my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
Note: if the computation is expensive or the hash has many elements,
you may want to look at the Schwartzian Transform to cache the
to look it up. We still get out a list of keys, but this time they
are ordered by their value.
- my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
+ my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
From there we can get more complex. If the hash values are the same,
we can provide a secondary sort on the hash key.
- my @keys = sort {
- $hash{$a} <=> $hash{$b}
- or
- "\L$a" cmp "\L$b"
- } keys %hash;
+ my @keys = sort {
+ $hash{$a} <=> $hash{$b}
+ or
+ "\L$a" cmp "\L$b"
+ } keys %hash;
=head2 How can I always keep my hash sorted?
X<hash tie sort DB_File Tie::IxHash>
You can look into using the C<DB_File> module and C<tie()> using the
C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
-Databases">. The C<Tie::IxHash> module from CPAN might also be
+Databases">. The L<Tie::IxHash> module from CPAN might also be
instructive. Although this does keep your hash sorted, you might not
like the slowdown you suffer from the tie interface. Are you sure you
need to do this? :)
Pictures help... Here's the C<%hash> table:
- keys values
- +------+------+
- | a | 3 |
- | x | 7 |
- | d | 0 |
- | e | 2 |
- +------+------+
+ keys values
+ +------+------+
+ | a | 3 |
+ | x | 7 |
+ | d | 0 |
+ | e | 2 |
+ +------+------+
And these conditions hold
- $hash{'a'} is true
- $hash{'d'} is false
- defined $hash{'d'} is true
- defined $hash{'a'} is true
- exists $hash{'a'} is true (Perl 5 only)
- grep ($_ eq 'a', keys %hash) is true
+ $hash{'a'} is true
+ $hash{'d'} is false
+ defined $hash{'d'} is true
+ defined $hash{'a'} is true
+ exists $hash{'a'} is true (Perl 5 only)
+ grep ($_ eq 'a', keys %hash) is true
If you now say
- undef $hash{'a'}
+ undef $hash{'a'}
your table now reads:
- keys values
- +------+------+
- | a | undef|
- | x | 7 |
- | d | 0 |
- | e | 2 |
- +------+------+
+ keys values
+ +------+------+
+ | a | undef|
+ | x | 7 |
+ | d | 0 |
+ | e | 2 |
+ +------+------+
and these conditions now hold; changes in caps:
- $hash{'a'} is FALSE
- $hash{'d'} is false
- defined $hash{'d'} is true
- defined $hash{'a'} is FALSE
- exists $hash{'a'} is true (Perl 5 only)
- grep ($_ eq 'a', keys %hash) is true
+ $hash{'a'} is FALSE
+ $hash{'d'} is false
+ defined $hash{'d'} is true
+ defined $hash{'a'} is FALSE
+ exists $hash{'a'} is true (Perl 5 only)
+ grep ($_ eq 'a', keys %hash) is true
Notice the last two: you have an undef value, but a defined key!
Now, consider this:
- delete $hash{'a'}
+ delete $hash{'a'}
your table now reads:
- keys values
- +------+------+
- | x | 7 |
- | d | 0 |
- | e | 2 |
- +------+------+
+ keys values
+ +------+------+
+ | x | 7 |
+ | d | 0 |
+ | e | 2 |
+ +------+------+
and these conditions now hold; changes in caps:
- $hash{'a'} is false
- $hash{'d'} is false
- defined $hash{'d'} is true
- defined $hash{'a'} is false
- exists $hash{'a'} is FALSE (Perl 5 only)
- grep ($_ eq 'a', keys %hash) is FALSE
+ $hash{'a'} is false
+ $hash{'d'} is false
+ defined $hash{'d'} is true
+ defined $hash{'a'} is false
+ exists $hash{'a'} is FALSE (Perl 5 only)
+ grep ($_ eq 'a', keys %hash) is FALSE
See, the whole entry is gone!
simply reset the iterator used by C<each> without doing anything else,
use one of them in void context:
- keys %hash; # resets iterator, nothing else.
- values %hash; # resets iterator, nothing else.
+ keys %hash; # resets iterator, nothing else.
+ values %hash; # resets iterator, nothing else.
See the documentation for C<each> in L<perlfunc>.
First you extract the keys from the hashes into lists, then solve
the "removing duplicates" problem described above. For example:
- %seen = ();
- for $element (keys(%foo), keys(%bar)) {
- $seen{$element}++;
- }
- @uniq = keys %seen;
+ %seen = ();
+ for $element (keys(%foo), keys(%bar)) {
+ $seen{$element}++;
+ }
+ @uniq = keys %seen;
Or more succinctly:
- @uniq = keys %{{%foo,%bar}};
+ @uniq = keys %{{%foo,%bar}};
Or if you really want to save space:
- %seen = ();
- while (defined ($key = each %foo)) {
- $seen{$key}++;
- }
- while (defined ($key = each %bar)) {
- $seen{$key}++;
- }
- @uniq = keys %seen;
+ %seen = ();
+ while (defined ($key = each %foo)) {
+ $seen{$key}++;
+ }
+ while (defined ($key = each %bar)) {
+ $seen{$key}++;
+ }
+ @uniq = keys %seen;
=head2 How can I store a multidimensional array in a DBM file?
=head2 How can I make my hash remember the order I put elements into it?
-Use the C<Tie::IxHash> from CPAN.
+Use the L<Tie::IxHash> from CPAN.
- use Tie::IxHash;
+ use Tie::IxHash;
- tie my %myhash, 'Tie::IxHash';
+ tie my %myhash, 'Tie::IxHash';
- for (my $i=0; $i<20; $i++) {
- $myhash{$i} = 2*$i;
- }
+ for (my $i=0; $i<20; $i++) {
+ $myhash{$i} = 2*$i;
+ }
- my @keys = keys %myhash;
- # @keys = (0,1,2,3,...)
+ my @keys = keys %myhash;
+ # @keys = (0,1,2,3,...)
=head2 Why does passing a subroutine an undefined element in a hash create it?
Normally, accessing a hash key's value for a nonexistent key will
I<not> create the key.
- my %hash = ();
- my $value = $hash{ 'foo' };
- print "This won't print\n" if exists $hash{ 'foo' };
+ my %hash = ();
+ my $value = $hash{ 'foo' };
+ print "This won't print\n" if exists $hash{ 'foo' };
Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
Since you could assign directly to C<$_[0]>, Perl had to be ready to
make that assignment so it created the hash key ahead of time:
my_sub( $hash{ 'foo' } );
- print "This will print before 5.004\n" if exists $hash{ 'foo' };
+ print "This will print before 5.004\n" if exists $hash{ 'foo' };
- sub my_sub {
- # $_[0] = 'bar'; # create hash key in case you do this
- 1;
- }
+ sub my_sub {
+ # $_[0] = 'bar'; # create hash key in case you do this
+ 1;
+ }
Since Perl 5.004, however, this situation is a special case and Perl
creates the hash key only when you make the assignment:
my_sub( $hash{ 'foo' } );
- print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
+ print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
- sub my_sub {
- $_[0] = 'bar';
- }
+ sub my_sub {
+ $_[0] = 'bar';
+ }
However, if you want the old behavior (and think carefully about that
because it's a weird side effect), you can pass a hash slice instead.
Perl 5.004 didn't make this a special case:
- my_sub( @hash{ qw/foo/ } );
+ my_sub( @hash{ qw/foo/ } );
=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
Usually a hash ref, perhaps like this:
- $record = {
- NAME => "Jason",
- EMPNO => 132,
- TITLE => "deputy peon",
- AGE => 23,
- SALARY => 37_000,
- PALS => [ "Norbert", "Rhys", "Phineas"],
- };
+ $record = {
+ NAME => "Jason",
+ EMPNO => 132,
+ TITLE => "deputy peon",
+ AGE => 23,
+ SALARY => 37_000,
+ PALS => [ "Norbert", "Rhys", "Phineas"],
+ };
References are documented in L<perlref> and L<perlreftut>.
Examples of complex data structures are given in L<perldsc> and
The trick to this problem is avoiding accidental autovivification. If
you want to check three keys deep, you might naE<0xEF>vely try this:
- my %hash;
- if( exists $hash{key1}{key2}{key3} ) {
- ...;
- }
+ my %hash;
+ if( exists $hash{key1}{key2}{key3} ) {
+ ...;
+ }
Even though you started with a completely empty hash, after that call to
C<exists> you've created the structure you needed to check for C<key3>:
- %hash = (
- 'key1' => {
- 'key2' => {}
- }
- );
+ %hash = (
+ 'key1' => {
+ 'key2' => {}
+ }
+ );
That's autovivification. You can get around this in a few ways. The
easiest way is to just turn it off. The lexical C<autovivification>
pragma is available on CPAN. Now you don't add to the hash:
- {
- no autovivification;
- my %hash;
- if( exists $hash{key1}{key2}{key3} ) {
- ...;
- }
- }
+ {
+ no autovivification;
+ my %hash;
+ if( exists $hash{key1}{key2}{key3} ) {
+ ...;
+ }
+ }
-The C<Data::Diver> module on CPAN can do it for you too. Its C<Dive>
+The L<Data::Diver> module on CPAN can do it for you too. Its C<Dive>
subroutine can tell you not only if the keys exist but also get the
value:
- use Data::Diver qw(Dive);
+ use Data::Diver qw(Dive);
my @exists = Dive( \%hash, qw(key1 key2 key3) );
if( ! @exists ) {
...; # keys do not exist
- }
+ }
elsif( ! defined $exists[0] ) {
...; # keys exist but value is undef
- }
+ }
You can easily do this yourself too by checking each level of the hash
before you move onto the next level. This is essentially what
-C<Data::Diver> does for you:
+L<Data::Diver> does for you:
- if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
- ...;
- }
+ if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
+ ...;
+ }
- sub check_hash {
- my( $hash, @keys ) = @_;
+ sub check_hash {
+ my( $hash, @keys ) = @_;
- return unless @keys;
+ return unless @keys;
- foreach my $key ( @keys ) {
- return unless eval { exists $hash->{$key} };
- $hash = $hash->{$key};
- }
+ foreach my $key ( @keys ) {
+ return unless eval { exists $hash->{$key} };
+ $hash = $hash->{$key};
+ }
- return 1;
- }
+ return 1;
+ }
=head1 Data: Misc
Assuming that you don't care about IEEE notations like "NaN" or
"Infinity", you probably just want to use a regular expression:
- use 5.010;
-
- given( $number ) {
- when( /\D/ )
- { say "\thas nondigits"; continue }
- when( /^\d+\z/ )
- { say "\tis a whole number"; continue }
- when( /^-?\d+\z/ )
- { say "\tis an integer"; continue }
- when( /^[+-]?\d+\z/ )
- { say "\tis a +/- integer"; continue }
- when( /^-?(?:\d+\.?|\.\d)\d*\z/ )
- { say "\tis a real number"; continue }
- when( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i)
- { say "\tis a C float" }
- }
+ use 5.010;
+
+ given( $number ) {
+ when( /\D/ )
+ { say "\thas nondigits"; continue }
+ when( /^\d+\z/ )
+ { say "\tis a whole number"; continue }
+ when( /^-?\d+\z/ )
+ { say "\tis an integer"; continue }
+ when( /^[+-]?\d+\z/ )
+ { say "\tis a +/- integer"; continue }
+ when( /^-?(?:\d+\.?|\.\d)\d*\z/ )
+ { say "\tis a real number"; continue }
+ when( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i)
+ { say "\tis a C float" }
+ }
There are also some commonly used modules for the task.
L<Scalar::Util> (distributed with 5.8) provides access to perl's
internal function C<looks_like_number> for determining whether a
variable looks like a number. L<Data::Types> exports functions that
validate data types using both the above and other regular
-expressions. Thirdly, there is C<Regexp::Common> which has regular
+expressions. Thirdly, there is L<Regexp::Common> which has regular
expressions to match various types of numbers. Those three modules are
available from the CPAN.
that isn't a C float. The C<is_numeric> function is a front end to
C<getnum> if you just want to say, "Is this a float?"
- sub getnum {
- use POSIX qw(strtod);
- my $str = shift;
- $str =~ s/^\s+//;
- $str =~ s/\s+$//;
- $! = 0;
- my($num, $unparsed) = strtod($str);
- if (($str eq '') || ($unparsed != 0) || $!) {
- return undef;
- }
- else {
- return $num;
- }
- }
-
- sub is_numeric { defined getnum($_[0]) }
+ sub getnum {
+ use POSIX qw(strtod);
+ my $str = shift;
+ $str =~ s/^\s+//;
+ $str =~ s/\s+$//;
+ $! = 0;
+ my($num, $unparsed) = strtod($str);
+ if (($str eq '') || ($unparsed != 0) || $!) {
+ return undef;
+ }
+ else {
+ return $num;
+ }
+ }
+
+ sub is_numeric { defined getnum($_[0]) }
Or you could check out the L<String::Scanf> module on the CPAN
instead.
=head2 How do I keep persistent data across program calls?
For some specific applications, you can use one of the DBM modules.
-See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
-or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
-of the standard distribution. Here's one example using C<Storable>'s C<store>
+See L<AnyDBM_File>. More generically, you should consult the L<FreezeThaw>
+or L<Storable> modules from CPAN. Starting from Perl 5.8 L<Storable> is part
+of the standard distribution. Here's one example using L<Storable>'s C<store>
and C<retrieve> functions:
- use Storable;
- store(\%hash, "filename");
+ use Storable;
+ store(\%hash, "filename");
- # later on...
- $href = retrieve("filename"); # by ref
- %hash = %{ retrieve("filename") }; # direct to hash
+ # later on...
+ $href = retrieve("filename"); # by ref
+ %hash = %{ retrieve("filename") }; # direct to hash
=head2 How do I print out or copy a recursive data structure?
-The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
-for printing out data structures. The C<Storable> module on CPAN (or the
+The L<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
+for printing out data structures. The L<Storable> module on CPAN (or the
5.8 release of Perl), provides a function called C<dclone> that recursively
copies its argument.
- use Storable qw(dclone);
- $r2 = dclone($r1);
+ use Storable qw(dclone);
+ $r2 = dclone($r1);
Where C<$r1> can be a reference to any kind of data structure you'd like.
It will be deeply copied. Because C<dclone> takes and returns references,
you'd have to add extra punctuation if you had a hash of arrays that
you wanted to copy.
- %newhash = %{ dclone(\%oldhash) };
+ %newhash = %{ dclone(\%oldhash) };
=head2 How do I define methods for every class/object?
=head2 How do I verify a credit card checksum?
-Get the C<Business::CreditCard> module from CPAN.
+Get the L<Business::CreditCard> module from CPAN.
=head2 How do I pack arrays of doubles or floats for XS code?
-The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
+The arrays.h/arrays.c code in the L<PGPLOT> module on CPAN does just this.
If you're doing a lot of float or double processing, consider using
-the C<PDL> module from CPAN instead--it makes number-crunching easy.
+the L<PDL> module from CPAN instead--it makes number-crunching easy.
See L<http://search.cpan.org/dist/PGPLOT> for the code.
Instead of seeing a dot for every line, Perl buffers the output and you
have a long wait before you see a row of 50 dots all at once:
- # long wait, then row of dots all at once
- while( <> ) {
- print ".";
- print "\n" unless ++$count % 50;
+ # long wait, then row of dots all at once
+ while( <> ) {
+ print ".";
+ print "\n" unless ++$count % 50;
- #... expensive line processing operations
- }
+ #... expensive line processing operations
+ }
To get around this, you have to unbuffer the output filehandle, in this
case, C<STDOUT>. You can set the special variable C<$|> to a true value
(mnemonic: making your filehandles "piping hot"):
- $|++;
+ $|++;
- # dot shown immediately
- while( <> ) {
- print ".";
- print "\n" unless ++$count % 50;
+ # dot shown immediately
+ while( <> ) {
+ print ".";
+ print "\n" unless ++$count % 50;
- #... expensive line processing operations
- }
+ #... expensive line processing operations
+ }
The C<$|> is one of the per-filehandle special variables, so each
filehandle has its own copy of its value. If you want to merge
standard output and standard error for instance, you have to unbuffer
each (although STDERR might be unbuffered by default):
- {
- my $previous_default = select(STDOUT); # save previous default
- $|++; # autoflush STDOUT
- select(STDERR);
- $|++; # autoflush STDERR, to be sure
- select($previous_default); # restore previous default
- }
-
- # now should alternate . and +
- while( 1 )
- {
- sleep 1;
- print STDOUT ".";
- print STDERR "+";
- print STDOUT "\n" unless ++$count % 25;
- }
+ {
+ my $previous_default = select(STDOUT); # save previous default
+ $|++; # autoflush STDOUT
+ select(STDERR);
+ $|++; # autoflush STDERR, to be sure
+ select($previous_default); # restore previous default
+ }
+
+ # now should alternate . and +
+ while( 1 ) {
+ sleep 1;
+ print STDOUT ".";
+ print STDERR "+";
+ print STDOUT "\n" unless ++$count % 25;
+ }
Besides the C<$|> special variable, you can use C<binmode> to give
your filehandle a C<:unix> layer, which is unbuffered:
- binmode( STDOUT, ":unix" );
+ binmode( STDOUT, ":unix" );
- while( 1 ) {
- sleep 1;
- print ".";
- print "\n" unless ++$count % 50;
- }
+ while( 1 ) {
+ sleep 1;
+ print ".";
+ print "\n" unless ++$count % 50;
+ }
For more information on output layers, see the entries for C<binmode>
-and C<open> in L<perlfunc>, and the C<PerlIO> module documentation.
+and L<open> in L<perlfunc>, and the L<PerlIO> module documentation.
-If you are using C<IO::Handle> or one of its subclasses, you can
+If you are using L<IO::Handle> or one of its subclasses, you can
call the C<autoflush> method to change the settings of the
filehandle:
- use IO::Handle;
- open my( $io_fh ), ">", "output.txt";
- $io_fh->autoflush(1);
+ use IO::Handle;
+ open my( $io_fh ), ">", "output.txt";
+ $io_fh->autoflush(1);
-The C<IO::Handle> objects also have a C<flush> method. You can flush
+The L<IO::Handle> objects also have a C<flush> method. You can flush
the buffer any time you want without auto-buffering
- $io_fh->flush;
+ $io_fh->flush;
=head2 How do I change, delete, or insert a line in a file, or append to the beginning of a file?
X<file, editing>
make the change, making the change, then reading and printing the rest
of the file. Perl doesn't provide random access to lines (especially
since the record input separator, C<$/>, is mutable), although modules
-such as C<Tie::File> can fake it.
+such as L<Tie::File> can fake it.
A Perl program to do these tasks takes the basic form of opening a
file, printing its lines, then closing the file:
- open my $in, '<', $file or die "Can't read old file: $!";
- open my $out, '>', "$file.new" or die "Can't write new file: $!";
+ open my $in, '<', $file or die "Can't read old file: $!";
+ open my $out, '>', "$file.new" or die "Can't write new file: $!";
- while( <$in> )
- {
- print $out $_;
- }
+ while( <$in> ) {
+ print $out $_;
+ }
- close $out;
+ close $out;
Within that basic form, add the parts that you need to insert, change,
or delete lines.
To prepend lines to the beginning, print those lines before you enter
the loop that prints the existing lines.
- open my $in, '<', $file or die "Can't read old file: $!";
- open my $out, '>', "$file.new" or die "Can't write new file: $!";
+ open my $in, '<', $file or die "Can't read old file: $!";
+ open my $out, '>', "$file.new" or die "Can't write new file: $!";
- print $out "# Add this line to the top\n"; # <--- HERE'S THE MAGIC
+ print $out "# Add this line to the top\n"; # <--- HERE'S THE MAGIC
- while( <$in> )
- {
- print $out $_;
- }
+ while( <$in> ) {
+ print $out $_;
+ }
- close $out;
+ close $out;
To change existing lines, insert the code to modify the lines inside
the C<while> loop. In this case, the code finds all lowercased
versions of "perl" and uppercases them. The happens for every line, so
be sure that you're supposed to do that on every line!
- open my $in, '<', $file or die "Can't read old file: $!";
- open my $out, '>', "$file.new" or die "Can't write new file: $!";
+ open my $in, '<', $file or die "Can't read old file: $!";
+ open my $out, '>', "$file.new" or die "Can't write new file: $!";
- print $out "# Add this line to the top\n";
+ print $out "# Add this line to the top\n";
- while( <$in> )
- {
- s/\b(perl)\b/Perl/g;
- print $out $_;
- }
+ while( <$in> ) {
+ s/\b(perl)\b/Perl/g;
+ print $out $_;
+ }
- close $out;
+ close $out;
To change only a particular line, the input line number, C<$.>, is
useful. First read and print the lines up to the one you want to
change. Next, read the single line you want to change, change it, and
print it. After that, read the rest of the lines and print those:
- while( <$in> ) # print the lines before the change
- {
- print $out $_;
- last if $. == 4; # line number before change
- }
+ while( <$in> ) { # print the lines before the change
+ print $out $_;
+ last if $. == 4; # line number before change
+ }
- my $line = <$in>;
- $line =~ s/\b(perl)\b/Perl/g;
- print $out $line;
+ my $line = <$in>;
+ $line =~ s/\b(perl)\b/Perl/g;
+ print $out $line;
- while( <$in> ) # print the rest of the lines
- {
- print $out $_;
- }
+ while( <$in> ) { # print the rest of the lines
+ print $out $_;
+ }
To skip lines, use the looping controls. The C<next> in this example
skips comment lines, and the C<last> stops all processing once it
encounters either C<__END__> or C<__DATA__>.
- while( <$in> )
- {
- next if /^\s+#/; # skip comment lines
- last if /^__(END|DATA)__$/; # stop at end of code marker
- print $out $_;
- }
+ while( <$in> ) {
+ next if /^\s+#/; # skip comment lines
+ last if /^__(END|DATA)__$/; # stop at end of code marker
+ print $out $_;
+ }
Do the same sort of thing to delete a particular line by using C<next>
to skip the lines you don't want to show up in the output. This
example skips every fifth line:
- while( <$in> )
- {
- next unless $. % 5;
- print $out $_;
- }
+ while( <$in> ) {
+ next unless $. % 5;
+ print $out $_;
+ }
If, for some odd reason, you really want to see the whole file at once
rather than processing line-by-line, you can slurp it in (as long as
you can fit the whole thing in memory!):
- open my $in, '<', $file or die "Can't read old file: $!"
- open my $out, '>', "$file.new" or die "Can't write new file: $!";
+ open my $in, '<', $file or die "Can't read old file: $!"
+ open my $out, '>', "$file.new" or die "Can't write new file: $!";
- my @lines = do { local $/; <$in> }; # slurp!
+ my @lines = do { local $/; <$in> }; # slurp!
- # do your magic here
+ # do your magic here
- print $out @lines;
+ print $out @lines;
-Modules such as C<File::Slurp> and C<Tie::File> can help with that
+Modules such as L<File::Slurp> and L<Tie::File> can help with that
too. If you can, however, avoid reading the entire file at once. Perl
won't give that memory back to the operating system until the process
finishes.
automatically prints the value of C<$_> at the end of the loop. See
L<perlrun> for more details.
- perl -pi -e 's/Fred/Barney/' inFile.txt
+ perl -pi -e 's/Fred/Barney/' inFile.txt
To make a backup of C<inFile.txt>, give C<-i> a file extension to add:
- perl -pi.bak -e 's/Fred/Barney/' inFile.txt
+ perl -pi.bak -e 's/Fred/Barney/' inFile.txt
To change only the fifth line, you can add a test checking C<$.>, the
input line number, then only perform the operation when the test
passes:
- perl -pi -e 's/Fred/Barney/ if $. == 5' inFile.txt
+ perl -pi -e 's/Fred/Barney/ if $. == 5' inFile.txt
To add lines before a certain line, you can add a line (or lines!)
before Perl prints C<$_>:
- perl -pi -e 'print "Put before third line\n" if $. == 3' inFile.txt
+ perl -pi -e 'print "Put before third line\n" if $. == 3' inFile.txt
You can even add a line to the beginning of a file, since the current
line prints at the end of the loop:
- perl -pi -e 'print "Put before first line\n" if $. == 1' inFile.txt
+ perl -pi -e 'print "Put before first line\n" if $. == 1' inFile.txt
To insert a line after one already in the file, use the C<-n> switch.
It's just like C<-p> except that it doesn't print C<$_> at the end of
the loop, so you have to do that yourself. In this case, print C<$_>
first, then print the line that you want to add.
- perl -ni -e 'print; print "Put after fifth line\n" if $. == 5' inFile.txt
+ perl -ni -e 'print; print "Put after fifth line\n" if $. == 5' inFile.txt
To delete lines, only print the ones that you want.
- perl -ni -e 'print unless /d/' inFile.txt
+ perl -ni -e 'print unless /d/' inFile.txt
- ... or ...
+ ... or ...
- perl -pi -e 'next unless /d/' inFile.txt
+ perl -pi -e 'next unless /d/' inFile.txt
=head2 How do I count the number of lines in a file?
X<file, counting lines> X<lines> X<line>
Conceptually, the easiest way to count the lines in a file is to
simply read them and count them:
- my $count = 0;
- while( <$fh> ) { $count++; }
+ my $count = 0;
+ while( <$fh> ) { $count++; }
You don't really have to count them yourself, though, since Perl
already does that with the C<$.> variable, which is the current line
number from the last filehandle read:
- 1 while( <$fh> );
- my $count = $.;
+ 1 while( <$fh> );
+ my $count = $.;
If you want to use C<$.>, you can reduce it to a simple one-liner,
like one of these:
- % perl -lne '} print $.; {' file
+ % perl -lne '} print $.; {' file
- % perl -lne 'END { print $. }' file
+ % perl -lne 'END { print $. }' file
Those can be rather inefficient though. If they aren't fast enough for
you, you might just read chunks of data and count the number of
newlines:
- my $lines = 0;
- open my($fh), '<:raw', $filename or die "Can't open $filename: $!";
- while( sysread $fh, $buffer, 4096 ) {
- $lines += ( $buffer =~ tr/\n// );
- }
- close FILE;
+ my $lines = 0;
+ open my($fh), '<:raw', $filename or die "Can't open $filename: $!";
+ while( sysread $fh, $buffer, 4096 ) {
+ $lines += ( $buffer =~ tr/\n// );
+ }
+ close FILE;
However, that doesn't work if the line ending isn't a newline. You
might change that C<tr///> to a C<s///> so you can count the number of
times the input record separator, C<$/>, shows up:
- my $lines = 0;
- open my($fh), '<:raw', $filename or die "Can't open $filename: $!";
- while( sysread $fh, $buffer, 4096 ) {
- $lines += ( $buffer =~ s|$/||g; );
- }
- close FILE;
+ my $lines = 0;
+ open my($fh), '<:raw', $filename or die "Can't open $filename: $!";
+ while( sysread $fh, $buffer, 4096 ) {
+ $lines += ( $buffer =~ s|$/||g; );
+ }
+ close FILE;
If you don't mind shelling out, the C<wc> command is usually the
fastest, even with the extra interprocess overhead. Ensure that you
have an untainted filename though:
- #!perl -T
+ #!perl -T
- $ENV{PATH} = undef;
+ $ENV{PATH} = undef;
- my $lines;
- if( $filename =~ /^([0-9a-z_.]+)\z/ ) {
- $lines = `/usr/bin/wc -l $1`
- chomp $lines;
- }
+ my $lines;
+ if( $filename =~ /^([0-9a-z_.]+)\z/ ) {
+ $lines = `/usr/bin/wc -l $1`
+ chomp $lines;
+ }
=head2 How do I delete the last N lines from a file?
X<lines> X<file>
without a lot of copying. The easy concept is the hard reality when
you might have millions of lines in your file.
-One trick is to use C<File::ReadBackwards>, which starts at the end of
+One trick is to use L<File::ReadBackwards>, which starts at the end of
the file. That module provides an object that wraps the real filehandle
to make it easy for you to move around the file. Once you get to the
spot you need, you can get the actual filehandle and work with it as
normal. In this case, you get the file position at the end of the last
line you want to keep and truncate the file to that point:
- use File::ReadBackwards;
+ use File::ReadBackwards;
- my $filename = 'test.txt';
- my $Lines_to_truncate = 2;
+ my $filename = 'test.txt';
+ my $Lines_to_truncate = 2;
- my $bw = File::ReadBackwards->new( $filename )
- or die "Could not read backwards in [$filename]: $!";
+ my $bw = File::ReadBackwards->new( $filename )
+ or die "Could not read backwards in [$filename]: $!";
- my $lines_from_end = 0;
- until( $bw->eof or $lines_from_end == $Lines_to_truncate )
- {
- print "Got: ", $bw->readline;
- $lines_from_end++;
- }
+ my $lines_from_end = 0;
+ until( $bw->eof or $lines_from_end == $Lines_to_truncate ) {
+ print "Got: ", $bw->readline;
+ $lines_from_end++;
+ }
- truncate( $filename, $bw->tell );
+ truncate( $filename, $bw->tell );
-The C<File::ReadBackwards> module also has the advantage of setting
+The L<File::ReadBackwards> module also has the advantage of setting
the input record separator to a regular expression.
-You can also use the C<Tie::File> module which lets you access
+You can also use the L<Tie::File> module which lets you access
the lines through a tied array. You can use normal array operations
to modify your file, including setting the last index and using
C<splice>.
modifying the appropriate variables directly, you can get the same
behavior within a larger program. For example:
- # ...
- {
- local($^I, @ARGV) = ('.orig', glob("*.c"));
- while (<>) {
- if ($. == 1) {
- print "This line should appear at the top of each file\n";
- }
- s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
- print;
- close ARGV if eof; # Reset $.
- }
- }
- # $^I and @ARGV return to their old values here
+ # ...
+ {
+ local($^I, @ARGV) = ('.orig', glob("*.c"));
+ while (<>) {
+ if ($. == 1) {
+ print "This line should appear at the top of each file\n";
+ }
+ s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
+ print;
+ close ARGV if eof; # Reset $.
+ }
+ }
+ # $^I and @ARGV return to their old values here
This block modifies all the C<.c> files in the current directory,
leaving a backup of the original data from each file in a new
(contributed by brian d foy)
-Use the C<File::Copy> module. It comes with Perl and can do a
+Use the L<File::Copy> module. It comes with Perl and can do a
true copy across file systems, and it does its magic in
a portable fashion.
- use File::Copy;
+ use File::Copy;
- copy( $original, $new_copy ) or die "Copy failed: $!";
+ copy( $original, $new_copy ) or die "Copy failed: $!";
-If you can't use C<File::Copy>, you'll have to do the work yourself:
+If you can't use L<File::Copy>, you'll have to do the work yourself:
open the original file, open the destination file, then print
to the destination file as you read the original. You also have to
remember to copy the permissions, owner, and group to the new file.
with C<undef> in place of the file name. In Perl 5.8 or later, the
C<open()> function creates an anonymous temporary file:
- open my $tmp, '+>', undef or die $!;
+ open my $tmp, '+>', undef or die $!;
Otherwise, you can use the File::Temp module.
- use File::Temp qw/ tempfile tempdir /;
+ use File::Temp qw/ tempfile tempdir /;
- my $dir = tempdir( CLEANUP => 1 );
- ($fh, $filename) = tempfile( DIR => $dir );
+ my $dir = tempdir( CLEANUP => 1 );
+ ($fh, $filename) = tempfile( DIR => $dir );
- # or if you don't need to know the filename
+ # or if you don't need to know the filename
- my $fh = tempfile( DIR => $dir );
+ my $fh = tempfile( DIR => $dir );
The File::Temp has been a standard module since Perl 5.6.1. If you
don't have a modern enough Perl installed, use the C<new_tmpfile>
class method from the IO::File module to get a filehandle opened for
reading and writing. Use it if you don't need to know the file's name:
- use IO::File;
- my $fh = IO::File->new_tmpfile()
- or die "Unable to make new temporary file: $!";
+ use IO::File;
+ my $fh = IO::File->new_tmpfile()
+ or die "Unable to make new temporary file: $!";
If you're committed to creating a temporary file by hand, use the
process ID and/or the current time-value. If you need to have many
temporary files in one process, use a counter:
- BEGIN {
- use Fcntl;
- my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMPDIR} || $ENV{TEMP};
- my $base_name = sprintf "%s/%d-%d-0000", $temp_dir, $$, time;
-
- sub temp_file {
- my $fh;
- my $count = 0;
- until( defined(fileno($fh)) || $count++ > 100 ) {
- $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
- # O_EXCL is required for security reasons.
- sysopen $fh, $base_name, O_WRONLY|O_EXCL|O_CREAT;
- }
-
- if( defined fileno($fh) ) {
- return ($fh, $base_name);
- }
- else {
- return ();
- }
- }
-
- }
+ BEGIN {
+ use Fcntl;
+ my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMPDIR} || $ENV{TEMP};
+ my $base_name = sprintf "%s/%d-%d-0000", $temp_dir, $$, time;
+
+ sub temp_file {
+ my $fh;
+ my $count = 0;
+ until( defined(fileno($fh)) || $count++ > 100 ) {
+ $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
+ # O_EXCL is required for security reasons.
+ sysopen $fh, $base_name, O_WRONLY|O_EXCL|O_CREAT;
+ }
+
+ if( defined fileno($fh) ) {
+ return ($fh, $base_name);
+ }
+ else {
+ return ();
+ }
+ }
+ }
=head2 How can I manipulate fixed-record-length files?
X<fixed-length> X<file, fixed-length records>
some fixed-format input lines, in this case from the output of a normal,
Berkeley-style ps:
- # sample input line:
- # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
- my $PS_T = 'A6 A4 A7 A5 A*';
- open my $ps, '-|', 'ps';
- print scalar <$ps>;
- my @fields = qw( pid tt stat time command );
- while (<$ps>) {
- my %process;
- @process{@fields} = unpack($PS_T, $_);
- for my $field ( @fields ) {
- print "$field: <$process{$field}>\n";
- }
- print 'line=', pack($PS_T, @process{@fields} ), "\n";
- }
+ # sample input line:
+ # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
+ my $PS_T = 'A6 A4 A7 A5 A*';
+ open my $ps, '-|', 'ps';
+ print scalar <$ps>;
+ my @fields = qw( pid tt stat time command );
+ while (<$ps>) {
+ my %process;
+ @process{@fields} = unpack($PS_T, $_);
+ for my $field ( @fields ) {
+ print "$field: <$process{$field}>\n";
+ }
+ print 'line=', pack($PS_T, @process{@fields} ), "\n";
+ }
We've used a hash slice in order to easily handle the fields of each row.
Storing the keys in an array makes it easy to operate on them as a
You can then pass these references just like any other scalar,
and use them in the place of named handles.
- open my $fh, $file_name;
+ open my $fh, $file_name;
- open local $fh, $file_name;
+ open local $fh, $file_name;
- print $fh "Hello World!\n";
+ print $fh "Hello World!\n";
- process_file( $fh );
+ process_file( $fh );
If you like, you can store these filehandles in an array or a hash.
If you access them directly, they aren't simple scalars and you
reference in braces. Perl can only figure it out on its own when
the filehandle reference is a simple scalar.
- my @fhs = ( $fh1, $fh2, $fh3 );
+ my @fhs = ( $fh1, $fh2, $fh3 );
- for( $i = 0; $i <= $#fhs; $i++ ) {
- print {$fhs[$i]} "just another Perl answer, \n";
- }
+ for( $i = 0; $i <= $#fhs; $i++ ) {
+ print {$fhs[$i]} "just another Perl answer, \n";
+ }
Before perl5.6, you had to deal with various typeglob idioms
which you may see in older code.
- open FILE, "> $filename";
- process_typeglob( *FILE );
- process_reference( \*FILE );
+ open FILE, "> $filename";
+ process_typeglob( *FILE );
+ process_reference( \*FILE );
- sub process_typeglob { local *FH = shift; print FH "Typeglob!" }
- sub process_reference { local $fh = shift; print $fh "Reference!" }
+ sub process_typeglob { local *FH = shift; print FH "Typeglob!" }
+ sub process_reference { local $fh = shift; print $fh "Reference!" }
If you want to create many anonymous handles, you should
check out the Symbol or IO::Handle modules.
in a place that a filehandle is expected. Here are ways
to get indirect filehandles:
- $fh = SOME_FH; # bareword is strict-subs hostile
- $fh = "SOME_FH"; # strict-refs hostile; same package only
- $fh = *SOME_FH; # typeglob
- $fh = \*SOME_FH; # ref to typeglob (bless-able)
- $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob
+ $fh = SOME_FH; # bareword is strict-subs hostile
+ $fh = "SOME_FH"; # strict-refs hostile; same package only
+ $fh = *SOME_FH; # typeglob
+ $fh = \*SOME_FH; # ref to typeglob (bless-able)
+ $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob
Or, you can use the C<new> method from one of the IO::* modules to
create an anonymous filehandle and store that in a scalar variable.
- use IO::Handle; # 5.004 or higher
- my $fh = IO::Handle->new();
+ use IO::Handle; # 5.004 or higher
+ my $fh = IO::Handle->new();
Then use any of those as you would a normal filehandle. Anywhere that
Perl is expecting a filehandle, an indirect filehandle may be used
the C<< <FH> >> diamond operator will accept either a named filehandle
or a scalar variable containing one:
- ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
- print $ofh "Type it: ";
- my $got = <$ifh>
- print $efh "What was that: $got";
+ ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
+ print $ofh "Type it: ";
+ my $got = <$ifh>
+ print $efh "What was that: $got";
If you're passing a filehandle to a function, you can write
the function in two ways:
- sub accept_fh {
- my $fh = shift;
- print $fh "Sending to indirect filehandle\n";
- }
+ sub accept_fh {
+ my $fh = shift;
+ print $fh "Sending to indirect filehandle\n";
+ }
Or it can localize a typeglob and use the filehandle directly:
- sub accept_fh {
- local *FH = shift;
- print FH "Sending to localized filehandle\n";
- }
+ sub accept_fh {
+ local *FH = shift;
+ print FH "Sending to localized filehandle\n";
+ }
Both styles work with either objects or typeglobs of real filehandles.
(They might also work with strings under some circumstances, but this
is risky.)
- accept_fh(*STDOUT);
- accept_fh($handle);
+ accept_fh(*STDOUT);
+ accept_fh($handle);
In the examples above, we assigned the filehandle to a scalar variable
before using it. That is because only simple scalar variables, not
something other than a simple scalar variable as a filehandle is
illegal and won't even compile:
- my @fd = (*STDIN, *STDOUT, *STDERR);
- print $fd[1] "Type it: "; # WRONG
- my $got = <$fd[0]> # WRONG
- print $fd[2] "What was that: $got"; # WRONG
+ my @fd = (*STDIN, *STDOUT, *STDERR);
+ print $fd[1] "Type it: "; # WRONG
+ my $got = <$fd[0]> # WRONG
+ print $fd[2] "What was that: $got"; # WRONG
With C<print> and C<printf>, you get around this by using a block and
an expression where you would place the filehandle:
- print { $fd[1] } "funny stuff\n";
- printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;
- # Pity the poor deadbeef.
+ print { $fd[1] } "funny stuff\n";
+ printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;
+ # Pity the poor deadbeef.
That block is a proper block like any other, so you can put more
complicated code there. This sends the message out to one of two places:
- my $ok = -x "/bin/cat";
- print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
- print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
+ my $ok = -x "/bin/cat";
+ print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
+ print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
This approach of treating C<print> and C<printf> like object methods
calls doesn't work for the diamond operator. That's because it's a
would work, but only because readline() requires a typeglob. It doesn't
work with objects or strings, which might be a bug we haven't fixed yet.
- $got = readline($fd[0]);
+ $got = readline($fd[0]);
Let it be noted that the flakiness of indirect filehandles is not
related to whether they're strings, typeglobs, objects, or anything else.
If you want to C<write> into a string, you just have to <open> a
filehandle to a string, which Perl has been able to do since Perl 5.6:
- open FH, '>', \my $string;
- write( FH );
+ open FH, '>', \my $string;
+ write( FH );
Since you want to be a good programmer, you probably want to use a lexical
filehandle, even though formats are designed to work with bareword filehandles
names the top-of-page format, and C<$~> which shows the line format. You have
to change the default filehandle to set these variables:
- open my($fh), '>', \my $string;
+ open my($fh), '>', \my $string;
- { # set per-filehandle variables
- my $old_fh = select( $fh );
- $~ = 'ANIMAL';
- $^ = 'ANIMAL_TOP';
- select( $old_fh );
- }
+ { # set per-filehandle variables
+ my $old_fh = select( $fh );
+ $~ = 'ANIMAL';
+ $^ = 'ANIMAL_TOP';
+ select( $old_fh );
+ }
- format ANIMAL_TOP =
- ID Type Name
- .
+ format ANIMAL_TOP =
+ ID Type Name
+ .
- format ANIMAL =
- @## @<<< @<<<<<<<<<<<<<<
- $id, $type, $name
- .
+ format ANIMAL =
+ @## @<<< @<<<<<<<<<<<<<<
+ $id, $type, $name
+ .
Although write can work with lexical or package variables, whatever variables
you use have to scope in the format. That most likely means you'll want to
localize some package variables:
- {
- local( $id, $type, $name ) = qw( 12 cat Buster );
- write( $fh );
- }
+ {
+ local( $id, $type, $name ) = qw( 12 cat Buster );
+ write( $fh );
+ }
- print $string;
+ print $string;
There are also some tricks that you can play with C<formline> and the
accumulator variable C<$^A>, but you lose a lot of the value of formats
calling open with a reference to that string instead of the filename.
This file handle can then be used to read from or write to the string:
- open(my $fh, '>', \$string) or die "Could not open string for writing";
- print $fh "foo\n";
- print $fh "bar\n"; # $string now contains "foo\nbar\n"
+ open(my $fh, '>', \$string) or die "Could not open string for writing";
+ print $fh "foo\n";
+ print $fh "bar\n"; # $string now contains "foo\nbar\n"
- open(my $fh, '<', \$string) or die "Could not open string for reading";
- my $x = <$fh>; # $x now contains "foo\n"
+ open(my $fh, '<', \$string) or die "Could not open string for reading";
+ my $x = <$fh>; # $x now contains "foo\n"
-With older versions of Perl, the C<IO::String> module provides similar
+With older versions of Perl, the L<IO::String> module provides similar
functionality.
=head2 How can I output my numbers with commas added?
This subroutine will add commas to your number:
- sub commify {
- local $_ = shift;
- 1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
- return $_;
- }
+ sub commify {
+ local $_ = shift;
+ 1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
+ return $_;
+ }
This regex from Benjamin Goldberg will add commas to numbers:
- s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
+ s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
It is easier to see with comments:
- s/(
- ^[-+]? # beginning of number.
- \d+? # first digits before first comma
- (?= # followed by, (but not included in the match) :
- (?>(?:\d{3})+) # some positive multiple of three digits.
- (?!\d) # an *exact* multiple, not x * 3 + 1 or whatever.
- )
- | # or:
- \G\d{3} # after the last group, get three digits
- (?=\d) # but they have to have more digits after them.
- )/$1,/xg;
+ s/(
+ ^[-+]? # beginning of number.
+ \d+? # first digits before first comma
+ (?= # followed by, (but not included in the match) :
+ (?>(?:\d{3})+) # some positive multiple of three digits.
+ (?!\d) # an *exact* multiple, not x * 3 + 1 or whatever.
+ )
+ | # or:
+ \G\d{3} # after the last group, get three digits
+ (?=\d) # but they have to have more digits after them.
+ )/$1,/xg;
=head2 How can I translate tildes (~) in a filename?
X<tilde> X<tilde expansion>
Use the E<lt>E<gt> (C<glob()>) operator, documented in L<perlfunc>.
Versions of Perl older than 5.6 require that you have a shell
installed that groks tildes. Later versions of Perl have this feature
-built in. The C<File::KGlob> module (available from CPAN) gives more
+built in. The L<File::KGlob> module (available from CPAN) gives more
portable glob functionality.
Within Perl, you may use this directly:
- $filename =~ s{
- ^ ~ # find a leading tilde
- ( # save this in $1
- [^/] # a non-slash character
- * # repeated 0 or more times (0 means me)
- )
- }{
- $1
- ? (getpwnam($1))[7]
- : ( $ENV{HOME} || $ENV{LOGDIR} )
- }ex;
+ $filename =~ s{
+ ^ ~ # find a leading tilde
+ ( # save this in $1
+ [^/] # a non-slash character
+ * # repeated 0 or more times (0 means me)
+ )
+ }{
+ $1
+ ? (getpwnam($1))[7]
+ : ( $ENV{HOME} || $ENV{LOGDIR} )
+ }ex;
=head2 How come when I open a file read-write it wipes it out?
X<clobber> X<read-write> X<clobbering> X<truncate> X<truncating>
Because you're using something like this, which truncates the file
I<then> gives you read-write access:
- open my $fh, '+>', '/path/name'; # WRONG (almost always)
+ open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
- open my $fh, '+<', '/path/name'; # open for update
+ open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does
either. The "+" doesn't change this.
Here are examples of many kinds of file opens. Those using C<sysopen>
-all assume that you've pulled in the constants from C<Fcntl>:
+all assume that you've pulled in the constants from L<Fcntl>:
- use Fcntl;
+ use Fcntl;
To open file for reading:
- open my $fh, '<', $path or die $!;
- sysopen my $fh, $path, O_RDONLY or die $!;
+ open my $fh, '<', $path or die $!;
+ sysopen my $fh, $path, O_RDONLY or die $!;
To open file for writing, create new file if needed or else truncate old file:
- open my $fh, '>', $path or die $!;
- sysopen my $fh, $path, O_WRONLY|O_TRUNC|O_CREAT or die $!;
- sysopen my $fh, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666 or die $!;
+ open my $fh, '>', $path or die $!;
+ sysopen my $fh, $path, O_WRONLY|O_TRUNC|O_CREAT or die $!;
+ sysopen my $fh, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666 or die $!;
To open file for writing, create new file, file must not exist:
- sysopen my $fh, $path, O_WRONLY|O_EXCL|O_CREAT or die $!;
- sysopen my $fh, $path, O_WRONLY|O_EXCL|O_CREAT, 0666 or die $!;
+ sysopen my $fh, $path, O_WRONLY|O_EXCL|O_CREAT or die $!;
+ sysopen my $fh, $path, O_WRONLY|O_EXCL|O_CREAT, 0666 or die $!;
To open file for appending, create if necessary:
- open my $fh, '>>' $path or die $!;
- sysopen my $fh, $path, O_WRONLY|O_APPEND|O_CREAT or die $!;
- sysopen my $fh, $path, O_WRONLY|O_APPEND|O_CREAT, 0666 or die $!;
+ open my $fh, '>>' $path or die $!;
+ sysopen my $fh, $path, O_WRONLY|O_APPEND|O_CREAT or die $!;
+ sysopen my $fh, $path, O_WRONLY|O_APPEND|O_CREAT, 0666 or die $!;
To open file for appending, file must exist:
- sysopen my $fh, $path, O_WRONLY|O_APPEND or die $!;
+ sysopen my $fh, $path, O_WRONLY|O_APPEND or die $!;
To open file for update, file must exist:
- open my $fh, '+<', $path or die $!;
- sysopen my $fh, $path, O_RDWR or die $!;
+ open my $fh, '+<', $path or die $!;
+ sysopen my $fh, $path, O_RDWR or die $!;
To open file for update, create file if necessary:
- sysopen my $fh, $path, O_RDWR|O_CREAT or die $!;
- sysopen my $fh, $path, O_RDWR|O_CREAT, 0666 or die $!;
+ sysopen my $fh, $path, O_RDWR|O_CREAT or die $!;
+ sysopen my $fh, $path, O_RDWR|O_CREAT, 0666 or die $!;
To open file for update, file must not exist:
- sysopen my $fh, $path, O_RDWR|O_EXCL|O_CREAT or die $!;
- sysopen my $fh, $path, O_RDWR|O_EXCL|O_CREAT, 0666 or die $!;
+ sysopen my $fh, $path, O_RDWR|O_EXCL|O_CREAT or die $!;
+ sysopen my $fh, $path, O_RDWR|O_EXCL|O_CREAT, 0666 or die $!;
To open a file without blocking, creating if necessary:
- sysopen my $fh, '/foo/somefile', O_WRONLY|O_NDELAY|O_CREAT
- or die "can't open /foo/somefile: $!":
+ sysopen my $fh, '/foo/somefile', O_WRONLY|O_NDELAY|O_CREAT
+ or die "can't open /foo/somefile: $!":
Be warned that neither creation nor deletion of files is guaranteed to
be an atomic operation over NFS. That is, two processes might both
should use the three-argument form of open() which does not treat any
characters in the filename as special.
- open my $fh, "<", " file "; # filename is " file "
- open my $fh, ">", ">file"; # filename is ">file"
+ open my $fh, "<", " file "; # filename is " file "
+ open my $fh, ">", ">file"; # filename is ">file"
=head2 How can I reliably rename a file?
X<rename> X<mv> X<move> X<file, rename>
If your operating system supports a proper mv(1) utility or its
functional equivalent, this works:
- rename($old, $new) or system("mv", $old, $new);
+ rename($old, $new) or system("mv", $old, $new);
-It may be more portable to use the C<File::Copy> module instead.
+It may be more portable to use the L<File::Copy> module instead.
You just copy to the new file to the new name (checking return
values), then delete the old one. This isn't really the same
semantically as a C<rename()>, which preserves meta-information like
A common bit of code B<NOT TO USE> is this:
- sleep(3) while -e 'file.lock'; # PLEASE DO NOT USE
- open my $lock, '>', 'file.lock'; # THIS BROKEN CODE
+ sleep(3) while -e 'file.lock'; # PLEASE DO NOT USE
+ open my $lock, '>', 'file.lock'; # THIS BROKEN CODE
This is a classic race condition: you take two steps to do something
which must be done in one. That's why computer hardware provides an
atomic test-and-set instruction. In theory, this "ought" to work:
- sysopen my $fh, "file.lock", O_WRONLY|O_EXCL|O_CREAT
- or die "can't open file.lock: $!";
+ sysopen my $fh, "file.lock", O_WRONLY|O_EXCL|O_CREAT
+ or die "can't open file.lock: $!";
except that lamentably, file creation (and deletion) is not atomic
over NFS, so this won't work (at least, not every time) over the net.
Anyway, this is what you can do if you can't help yourself.
- use Fcntl qw(:DEFAULT :flock);
- sysopen my $fh, "numfile", O_RDWR|O_CREAT or die "can't open numfile: $!";
- flock $fh, LOCK_EX or die "can't flock numfile: $!";
- my $num = <$fh> || 0;
- seek $fh, 0, 0 or die "can't rewind numfile: $!";
- truncate $fh, 0 or die "can't truncate numfile: $!";
- (print $fh $num+1, "\n") or die "can't write numfile: $!";
- close $fh or die "can't close numfile: $!";
+ use Fcntl qw(:DEFAULT :flock);
+ sysopen my $fh, "numfile", O_RDWR|O_CREAT or die "can't open numfile: $!";
+ flock $fh, LOCK_EX or die "can't flock numfile: $!";
+ my $num = <$fh> || 0;
+ seek $fh, 0, 0 or die "can't rewind numfile: $!";
+ truncate $fh, 0 or die "can't truncate numfile: $!";
+ (print $fh $num+1, "\n") or die "can't write numfile: $!";
+ close $fh or die "can't close numfile: $!";
Here's a much better web-page hit counter:
- $hits = int( (time() - 850_000_000) / rand(1_000) );
+ $hits = int( (time() - 850_000_000) / rand(1_000) );
If the count doesn't impress your friends, then the code might. :-)
If you're just trying to patch a binary, in many cases something as
simple as this works:
- perl -i -pe 's{window manager}{window mangler}g' /usr/bin/emacs
+ perl -i -pe 's{window manager}{window mangler}g' /usr/bin/emacs
However, if you have fixed sized records, then you might do something more
like this:
- $RECSIZE = 220; # size of record, in bytes
- $recno = 37; # which record to update
- open my $fh, '+<', 'somewhere' or die "can't update somewhere: $!";
- seek $fh, $recno * $RECSIZE, 0;
- read $fh, $record, $RECSIZE == $RECSIZE or die "can't read record $recno: $!";
- # munge the record
- seek $fh, -$RECSIZE, 1;
- print $fh $record;
- close $fh;
+ $RECSIZE = 220; # size of record, in bytes
+ $recno = 37; # which record to update
+ open my $fh, '+<', 'somewhere' or die "can't update somewhere: $!";
+ seek $fh, $recno * $RECSIZE, 0;
+ read $fh, $record, $RECSIZE == $RECSIZE or die "can't read record $recno: $!";
+ # munge the record
+ seek $fh, -$RECSIZE, 1;
+ print $fh $record;
+ close $fh;
Locking and error checking are left as an exercise for the reader.
Don't forget them or you'll be quite sorry.
Here's an example:
- my $write_secs = (stat($file))[9];
- printf "file %s updated at %s\n", $file,
- scalar localtime($write_secs);
+ my $write_secs = (stat($file))[9];
+ printf "file %s updated at %s\n", $file,
+ scalar localtime($write_secs);
If you prefer something more legible, use the File::stat module
(part of the standard distribution in version 5.004 and later):
- # error checking left as an exercise for reader.
- use File::stat;
- use Time::localtime;
- my $date_string = ctime(stat($file)->mtime);
- print "file $file updated at $date_string\n";
+ # error checking left as an exercise for reader.
+ use File::stat;
+ use Time::localtime;
+ my $date_string = ctime(stat($file)->mtime);
+ print "file $file updated at $date_string\n";
The POSIX::strftime() approach has the benefit of being,
in theory, independent of the current locale. See L<perllocale>
read and write times from its first argument to all the rest
of them.
- if (@ARGV < 2) {
- die "usage: cptimes timestamp_file other_files ...\n";
- }
- my $timestamp = shift;
- my($atime, $mtime) = (stat($timestamp))[8,9];
- utime $atime, $mtime, @ARGV;
+ if (@ARGV < 2) {
+ die "usage: cptimes timestamp_file other_files ...\n";
+ }
+ my $timestamp = shift;
+ my($atime, $mtime) = (stat($timestamp))[8,9];
+ utime $atime, $mtime, @ARGV;
Error checking is, as usual, left as an exercise for the reader.
If you only have to do this once, you can print individually
to each filehandle.
- for my $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
+ for my $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
=head2 How can I read in an entire file all at once?
X<slurp> X<file, slurping>
The customary Perl approach for processing all the lines in a file is to
do so one line at a time:
- open my $input, '<', $file or die "can't open $file: $!";
- while (<$input>) {
- chomp;
- # do something with $_
- }
- close $input or die "can't close $file: $!";
+ open my $input, '<', $file or die "can't open $file: $!";
+ while (<$input>) {
+ chomp;
+ # do something with $_
+ }
+ close $input or die "can't close $file: $!";
This is tremendously more efficient than reading the entire file into
memory as an array of lines and then processing it one element at a time,
which is often--if not almost always--the wrong approach. Whenever
you see someone do this:
- my @lines = <INPUT>;
+ my @lines = <INPUT>;
You should think long and hard about why you need everything loaded at
once. It's just not a scalable solution.
CPAN, you can virtually load the entire file into a
string without actually storing it in memory:
- use File::Map qw(map_file);
+ use File::Map qw(map_file);
- map_file my $string, $filename;
+ map_file my $string, $filename;
Once mapped, you can treat C<$string> as you would any other string.
Since you don't necessarily have to load the data, mmap-ing can be
very fast and may not increase your memory footprint.
You might also find it more
-fun to use the standard C<Tie::File> module, or the C<DB_File> module's
+fun to use the standard L<Tie::File> module, or the L<DB_File> module's
C<$DB_RECNO> bindings, which allow you to tie an array to a file so that
accessing an element of the array actually accesses the corresponding
line in the file.
-If you want to load the entire file, you can use the C<File::Slurp>
+If you want to load the entire file, you can use the L<File::Slurp>
module to do it in one one simple and efficient step:
- use File::Slurp;
+ use File::Slurp;
- my $all_of_it = read_file($filename); # entire file in scalar
- my @all_lines = read_file($filename); # one line per element
+ my $all_of_it = read_file($filename); # entire file in scalar
+ my @all_lines = read_file($filename); # one line per element
Or you can read the entire file contents into a scalar like this:
- my $var;
- {
- local $/;
- open my $fh, '<', $file or die "can't open $file: $!";
- $var = <$fh>;
- }
+ my $var;
+ {
+ local $/;
+ open my $fh, '<', $file or die "can't open $file: $!";
+ $var = <$fh>;
+ }
That temporarily undefs your record separator, and will automatically
close the file at block exit. If the file is already open, just use this:
- my $var = do { local $/; <$fh> };
+ my $var = do { local $/; <$fh> };
You can also use a localized C<@ARGV> to eliminate the C<open>:
- my $var = do { local( @ARGV, $/ ) = $file; <> };
+ my $var = do { local( @ARGV, $/ ) = $file; <> };
For ordinary files you can also use the C<read> function.
- read( $fh, $var, -s $fh );
+ read( $fh, $var, -s $fh );
That third argument tests the byte size of the data on the C<$fh> filehandle
and reads that many bytes into the buffer C<$var>.
interface (POSIX), you can use the following code, which you'll note
turns off echo processing as well.
- #!/usr/bin/perl -w
- use strict;
- $| = 1;
- for (1..4) {
- print "gimme: ";
- my $got = getone();
- print "--> $got\n";
- }
+ #!/usr/bin/perl -w
+ use strict;
+ $| = 1;
+ for (1..4) {
+ print "gimme: ";
+ my $got = getone();
+ print "--> $got\n";
+ }
exit;
- BEGIN {
- use POSIX qw(:termios_h);
+ BEGIN {
+ use POSIX qw(:termios_h);
- my ($term, $oterm, $echo, $noecho, $fd_stdin);
+ my ($term, $oterm, $echo, $noecho, $fd_stdin);
- my $fd_stdin = fileno(STDIN);
+ my $fd_stdin = fileno(STDIN);
- $term = POSIX::Termios->new();
- $term->getattr($fd_stdin);
- $oterm = $term->getlflag();
+ $term = POSIX::Termios->new();
+ $term->getattr($fd_stdin);
+ $oterm = $term->getlflag();
- $echo = ECHO | ECHOK | ICANON;
- $noecho = $oterm & ~$echo;
+ $echo = ECHO | ECHOK | ICANON;
+ $noecho = $oterm & ~$echo;
- sub cbreak {
- $term->setlflag($noecho);
- $term->setcc(VTIME, 1);
- $term->setattr($fd_stdin, TCSANOW);
- }
+ sub cbreak {
+ $term->setlflag($noecho);
+ $term->setcc(VTIME, 1);
+ $term->setattr($fd_stdin, TCSANOW);
+ }
- sub cooked {
- $term->setlflag($oterm);
- $term->setcc(VTIME, 0);
- $term->setattr($fd_stdin, TCSANOW);
- }
+ sub cooked {
+ $term->setlflag($oterm);
+ $term->setcc(VTIME, 0);
+ $term->setattr($fd_stdin, TCSANOW);
+ }
- sub getone {
- my $key = '';
- cbreak();
- sysread(STDIN, $key, 1);
- cooked();
- return $key;
- }
+ sub getone {
+ my $key = '';
+ cbreak();
+ sysread(STDIN, $key, 1);
+ cooked();
+ return $key;
+ }
+ }
- }
-
- END { cooked() }
+ END { cooked() }
The Term::ReadKey module from CPAN may be easier to use. Recent versions
include also support for non-portable systems as well.
- use Term::ReadKey;
- open my $tty, '<', '/dev/tty';
- print "Gimme a char: ";
- ReadMode "raw";
- my $key = ReadKey 0, $tty;
- ReadMode "normal";
- printf "\nYou said %s, char number %03d\n",
- $key, ord $key;
+ use Term::ReadKey;
+ open my $tty, '<', '/dev/tty';
+ print "Gimme a char: ";
+ ReadMode "raw";
+ my $key = ReadKey 0, $tty;
+ ReadMode "normal";
+ printf "\nYou said %s, char number %03d\n",
+ $key, ord $key;
=head2 How can I tell whether there's a character waiting on a filehandle?
It's very system-dependent. Here's one solution that works on BSD
systems:
- sub key_ready {
- my($rin, $nfd);
- vec($rin, fileno(STDIN), 1) = 1;
- return $nfd = select($rin,undef,undef,0);
- }
+ sub key_ready {
+ my($rin, $nfd);
+ vec($rin, fileno(STDIN), 1) = 1;
+ return $nfd = select($rin,undef,undef,0);
+ }
If you want to find out how many characters are waiting, there's
also the FIONREAD ioctl call to be looked at. The I<h2ph> tool that
can be C<require>d. FIONREAD ends up defined as a function in the
I<sys/ioctl.ph> file:
- require 'sys/ioctl.ph';
+ require 'sys/ioctl.ph';
- $size = pack("L", 0);
- ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $!\n";
- $size = unpack("L", $size);
+ $size = pack("L", 0);
+ ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $!\n";
+ $size = unpack("L", $size);
If I<h2ph> wasn't installed or doesn't work for you, you can
I<grep> the include files by hand:
- % grep FIONREAD /usr/include/*/*
- /usr/include/asm/ioctls.h:#define FIONREAD 0x541B
+ % grep FIONREAD /usr/include/*/*
+ /usr/include/asm/ioctls.h:#define FIONREAD 0x541B
Or write a small C program using the editor of champions:
- % cat > fionread.c
- #include <sys/ioctl.h>
- main() {
- printf("%#08x\n", FIONREAD);
- }
- ^D
- % cc -o fionread fionread.c
- % ./fionread
- 0x4004667f
+ % cat > fionread.c
+ #include <sys/ioctl.h>
+ main() {
+ printf("%#08x\n", FIONREAD);
+ }
+ ^D
+ % cc -o fionread fionread.c
+ % ./fionread
+ 0x4004667f
And then hard-code it, leaving porting as an exercise to your successor.
- $FIONREAD = 0x4004667f; # XXX: opsys dependent
+ $FIONREAD = 0x4004667f; # XXX: opsys dependent
- $size = pack("L", 0);
- ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n";
- $size = unpack("L", $size);
+ $size = pack("L", 0);
+ ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n";
+ $size = unpack("L", $size);
FIONREAD requires a filehandle connected to a stream, meaning that sockets,
pipes, and tty devices work, but I<not> files.
First try
- seek(GWFILE, 0, 1);
+ seek(GWFILE, 0, 1);
The statement C<seek(GWFILE, 0, 1)> doesn't change the current position,
but it does clear the end-of-file condition on the handle, so that the
If that doesn't work (it relies on features of your stdio implementation),
then you need something more like this:
- for (;;) {
- for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
- # search for some stuff and put it into files
- }
- # sleep for a while
- seek(GWFILE, $curpos, 0); # seek to where we had been
- }
+ for (;;) {
+ for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
+ # search for some stuff and put it into files
+ }
+ # sleep for a while
+ seek(GWFILE, $curpos, 0); # seek to where we had been
+ }
If this still doesn't work, look into the C<clearerr> method
-from C<IO::Handle>, which resets the error and end-of-file states
+from L<IO::Handle>, which resets the error and end-of-file states
on the handle.
-There's also a C<File::Tail> module from CPAN.
+There's also a L<File::Tail> module from CPAN.
=head2 How do I dup() a filehandle in Perl?
X<dup>
If you check L<perlfunc/open>, you'll see that several of the ways
to call open() should do the trick. For example:
- open my $log, '>>', '/foo/logfile';
- open STDERR, '>&LOG';
+ open my $log, '>>', '/foo/logfile';
+ open STDERR, '>&LOG';
Or even with a literal numeric descriptor:
- my $fd = $ENV{MHCONTEXTFD};
- open $mhcontext, "<&=$fd"; # like fdopen(3S)
+ my $fd = $ENV{MHCONTEXTFD};
+ open $mhcontext, "<&=$fd"; # like fdopen(3S)
Note that "<&STDIN" makes a copy, but "<&=STDIN" makes
an alias. That means if you close an aliased handle, all
If, for some reason, you have a file descriptor instead of a
filehandle (perhaps you used C<POSIX::open>), you can use the
-C<close()> function from the C<POSIX> module:
+C<close()> function from the L<POSIX> module:
- use POSIX ();
+ use POSIX ();
- POSIX::close( $fd );
+ POSIX::close( $fd );
This should rarely be necessary, as the Perl C<close()> function is to be
used for things that Perl opened itself, even if it was a dup of a
numeric descriptor as with C<MHCONTEXT> above. But if you really have
to, you may be able to do this:
- require 'sys/syscall.ph';
- my $rc = syscall(&SYS_close, $fd + 0); # must force numeric
- die "can't sysclose $fd: $!" unless $rc == -1;
+ require 'sys/syscall.ph';
+ my $rc = syscall(&SYS_close, $fd + 0); # must force numeric
+ die "can't sysclose $fd: $!" unless $rc == -1;
Or, just use the fdopen(3S) feature of C<open()>:
- {
- open my( $fh ), "<&=$fd" or die "Cannot reopen fd=$fd: $!";
- close $fh;
- }
+ {
+ open my( $fh ), "<&=$fd" or die "Cannot reopen fd=$fd: $!";
+ close $fh;
+ }
=head2 Why can't I use "C:\temp\foo" in DOS paths? Why doesn't `C:\temp\foo.exe` work?
X<filename, DOS issues>
Here's a reservoir-sampling algorithm from the Camel Book:
- srand;
- rand($.) < 1 && ($line = $_) while <>;
+ srand;
+ rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading the whole file
in. You can find a proof of this method in I<The Art of Computer
Programming>, Volume 2, Section 3.4.2, by Donald E. Knuth.
-You can use the C<File::Random> module which provides a function
+You can use the L<File::Random> module which provides a function
for that algorithm:
- use File::Random qw/random_line/;
- my $line = random_line($filename);
+ use File::Random qw/random_line/;
+ my $line = random_line($filename);
-Another way is to use the C<Tie::File> module, which treats the entire
+Another way is to use the L<Tie::File> module, which treats the entire
file as an array. Simply access a random array element.
=head2 Why do I get weird spaces when I print an array of lines?
you print the array, you are probably interpolating the array in
double quotes:
- my @animals = qw(camel llama alpaca vicuna);
- print "animals are: @animals\n";
+ my @animals = qw(camel llama alpaca vicuna);
+ print "animals are: @animals\n";
It's the double quotes, not the C<print>, doing this. Whenever you
interpolate an array in a double quote context, Perl joins the
elements with spaces (or whatever is in C<$">, which is a space by
default):
- animals are: camel llama alpaca vicuna
+ animals are: camel llama alpaca vicuna
This is different than printing the array without the interpolation:
- my @animals = qw(camel llama alpaca vicuna);
- print "animals are: ", @animals, "\n";
+ my @animals = qw(camel llama alpaca vicuna);
+ print "animals are: ", @animals, "\n";
Now the output doesn't have the spaces between the elements because
the elements of C<@animals> simply become part of the list to
C<print>:
- animals are: camelllamaalpacavicuna
+ animals are: camelllamaalpacavicuna
You might notice this when each of the elements of C<@array> end with
a newline. You expect to print one element per line, but notice that
every line after the first is indented:
- this is a line
- this is another line
- this is the third line
+ this is a line
+ this is another line
+ this is the third line
That extra space comes from the interpolation of the array. If you
don't want to put anything between your array elements, don't use the
array in double quotes. You can send it to print without them:
- print @lines;
+ print @lines;
=head2 How do I traverse a directory tree?
(contributed by brian d foy)
-The C<File::Find> module, which comes with Perl, does all of the hard
+The L<File::Find> module, which comes with Perl, does all of the hard
work to traverse a directory structure. It comes with Perl. You simply
call the C<find> subroutine with a callback subroutine and the
directories you want to traverse:
- use File::Find;
+ use File::Find;
- find( \&wanted, @directories );
+ find( \&wanted, @directories );
- sub wanted {
- # full path in $File::Find::name
- # just filename in $_
- ... do whatever you want to do ...
- }
+ sub wanted {
+ # full path in $File::Find::name
+ # just filename in $_
+ ... do whatever you want to do ...
+ }
-The C<File::Find::Closures>, which you can download from CPAN, provides
-many ready-to-use subroutines that you can use with C<File::Find>.
+The L<File::Find::Closures>, which you can download from CPAN, provides
+many ready-to-use subroutines that you can use with L<File::Find>.
-The C<File::Finder>, which you can download from CPAN, can help you
+The L<File::Finder>, which you can download from CPAN, can help you
create the callback subroutine using something closer to the syntax of
the C<find> command-line utility:
- use File::Find;
- use File::Finder;
+ use File::Find;
+ use File::Finder;
- my $deep_dirs = File::Finder->depth->type('d')->ls->exec('rmdir','{}');
+ my $deep_dirs = File::Finder->depth->type('d')->ls->exec('rmdir','{}');
- find( $deep_dirs->as_options, @places );
+ find( $deep_dirs->as_options, @places );
-The C<File::Find::Rule> module, which you can download from CPAN, has
+The L<File::Find::Rule> module, which you can download from CPAN, has
a similar interface, but does the traversal for you too:
- use File::Find::Rule;
+ use File::Find::Rule;
- my @files = File::Find::Rule->file()
- ->name( '*.pm' )
- ->in( @INC );
+ my @files = File::Find::Rule->file()
+ ->name( '*.pm' )
+ ->in( @INC );
=head2 How do I delete a directory tree?
either have to empty it yourself (a lot of work) or use a module to
help you.
-The C<File::Path> module, which comes with Perl, has a C<remove_tree>
+The L<File::Path> module, which comes with Perl, has a C<remove_tree>
which can take care of all of the hard work for you:
- use File::Path qw(remove_tree);
+ use File::Path qw(remove_tree);
- remove_tree( @directories );
+ remove_tree( @directories );
-The C<File::Path> module also has a legacy interface to the older
+The L<File::Path> module also has a legacy interface to the older
C<rmtree> subroutine.
=head2 How do I copy an entire directory?
Describe what you're doing and how you're doing it, using normal Perl
comments.
- # turn the line into the first word, a colon, and the
- # number of characters on the rest of the line
- s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
+ # turn the line into the first word, a colon, and the
+ # number of characters on the rest of the line
+ s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
=item Comments Inside the Regex
C</x> lets you turn this:
- s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;
+ s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;
into this:
- s{ < # opening angle bracket
- (?: # Non-backreffing grouping paren
- [^>'"] * # 0 or more things that are neither > nor ' nor "
- | # or else
- ".*?" # a section between double quotes (stingy match)
- | # or else
- '.*?' # a section between single quotes (stingy match)
- ) + # all occurring one or more times
- > # closing angle bracket
- }{}gsx; # replace with nothing, i.e. delete
+ s{ < # opening angle bracket
+ (?: # Non-backreffing grouping paren
+ [^>'"] * # 0 or more things that are neither > nor ' nor "
+ | # or else
+ ".*?" # a section between double quotes (stingy match)
+ | # or else
+ '.*?' # a section between single quotes (stingy match)
+ ) + # all occurring one or more times
+ > # closing angle bracket
+ }{}gsx; # replace with nothing, i.e. delete
It's still not quite so clear as prose, but it is very useful for
describing the meaning of each part of the pattern.
delimiters. Selecting another delimiter can avoid quoting the
delimiter within the pattern:
- s/\/usr\/local/\/usr\/share/g; # bad delimiter choice
- s#/usr/local#/usr/share#g; # better
+ s/\/usr\/local/\/usr\/share/g; # bad delimiter choice
+ s#/usr/local#/usr/share#g; # better
=back
than the default, or else we won't actually ever have a multiline
record read in.
- $/ = ''; # read in whole paragraph, not just one line
- while ( <> ) {
- while ( /\b([\w'-]+)(\s+\g1)+\b/gi ) { # word starts alpha
- print "Duplicate $1 at paragraph $.\n";
- }
- }
+ $/ = ''; # read in whole paragraph, not just one line
+ while ( <> ) {
+ while ( /\b([\w'-]+)(\s+\g1)+\b/gi ) { # word starts alpha
+ print "Duplicate $1 at paragraph $.\n";
+ }
+ }
Here's code that finds sentences that begin with "From " (which would
be mangled by many mailers):
- $/ = ''; # read in whole paragraph, not just one line
- while ( <> ) {
- while ( /^From /gm ) { # /m makes ^ match next to \n
- print "leading from in paragraph $.\n";
- }
- }
+ $/ = ''; # read in whole paragraph, not just one line
+ while ( <> ) {
+ while ( /^From /gm ) { # /m makes ^ match next to \n
+ print "leading from in paragraph $.\n";
+ }
+ }
Here's code that finds everything between START and END in a paragraph:
- undef $/; # read in whole file, not just one line or paragraph
- while ( <> ) {
- while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries
- print "$1\n";
- }
- }
+ undef $/; # read in whole file, not just one line or paragraph
+ while ( <> ) {
+ while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries
+ print "$1\n";
+ }
+ }
=head2 How can I pull out lines between two patterns that are themselves on different lines?
X<..>
You can use Perl's somewhat exotic C<..> operator (documented in
L<perlop>):
- perl -ne 'print if /START/ .. /END/' file1 file2 ...
+ perl -ne 'print if /START/ .. /END/' file1 file2 ...
If you wanted text and not lines, you would use
- perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
+ perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
But if you want nested occurrences of C<START> through C<END>, you'll
run up against the problem described in the question in this section
Here's another example of using C<..>:
- while (<>) {
- $in_header = 1 .. /^$/;
- $in_body = /^$/ .. eof;
- # now choose between them
- } continue {
- $. = 0 if eof; # fix $.
- }
+ while (<>) {
+ $in_header = 1 .. /^$/;
+ $in_body = /^$/ .. eof;
+ # now choose between them
+ } continue {
+ $. = 0 if eof; # fix $.
+ }
=head2 How do I match XML, HTML, or other nasty, ugly things with a regex?
X<regex, XML> X<regex, HTML> X<XML> X<HTML> X<pain> X<frustration>
(contributed by brian d foy)
If you just want to get work done, use a module and forget about the
-regular expressions. The C<XML::Parser> and C<HTML::Parser> modules
+regular expressions. The L<XML::Parser> and L<HTML::Parser> modules
are good starts, although each namespace has other parsing modules
specialized for certain tasks and different ways of doing it. Start at
CPAN Search ( http://search.cpan.org ) and wonder at all the work people
If you have File::Stream, this is easy.
- use File::Stream;
+ use File::Stream;
- my $stream = File::Stream->new(
- $filehandle,
- separator => qr/\s*,\s*/,
- );
+ my $stream = File::Stream->new(
+ $filehandle,
+ separator => qr/\s*,\s*/,
+ );
- print "$_\n" while <$stream>;
+ print "$_\n" while <$stream>;
If you don't have File::Stream, you have to do a little more work.
a buffer. After you add to the buffer, you check if you have a
complete line (using your regular expression).
- local $_ = "";
- while( sysread FH, $_, 8192, length ) {
- while( s/^((?s).*?)your_pattern// ) {
- my $record = $1;
- # do stuff here.
- }
- }
+ local $_ = "";
+ while( sysread FH, $_, 8192, length ) {
+ while( s/^((?s).*?)your_pattern// ) {
+ my $record = $1;
+ # do stuff here.
+ }
+ }
You can do the same thing with foreach and a match using the
c flag and the \G anchor, if you do not mind your entire file
being in memory at the end.
- local $_ = "";
- while( sysread FH, $_, 8192, length ) {
- foreach my $record ( m/\G((?s).*?)your_pattern/gc ) {
- # do stuff here.
- }
- substr( $_, 0, pos ) = "" if pos;
- }
+ local $_ = "";
+ while( sysread FH, $_, 8192, length ) {
+ foreach my $record ( m/\G((?s).*?)your_pattern/gc ) {
+ # do stuff here.
+ }
+ substr( $_, 0, pos ) = "" if pos;
+ }
=head2 How do I substitute case-insensitively on the LHS while preserving case on the RHS?
Here's a lovely Perlish solution by Larry Rosler. It exploits
properties of bitwise xor on ASCII strings.
- $_= "this is a TEsT case";
+ $_= "this is a TEsT case";
- $old = 'test';
- $new = 'success';
+ $old = 'test';
+ $new = 'success';
- s{(\Q$old\E)}
- { uc $new | (uc $1 ^ $1) .
- (uc(substr $1, -1) ^ substr $1, -1) x
- (length($new) - length $1)
- }egi;
+ s{(\Q$old\E)}
+ { uc $new | (uc $1 ^ $1) .
+ (uc(substr $1, -1) ^ substr $1, -1) x
+ (length($new) - length $1)
+ }egi;
- print;
+ print;
And here it is as a subroutine, modeled after the above:
- sub preserve_case($$) {
- my ($old, $new) = @_;
- my $mask = uc $old ^ $old;
+ sub preserve_case($$) {
+ my ($old, $new) = @_;
+ my $mask = uc $old ^ $old;
- uc $new | $mask .
- substr($mask, -1) x (length($new) - length($old))
+ uc $new | $mask .
+ substr($mask, -1) x (length($new) - length($old))
}
- $string = "this is a TEsT case";
- $string =~ s/(test)/preserve_case($1, "success")/egi;
- print "$string\n";
+ $string = "this is a TEsT case";
+ $string =~ s/(test)/preserve_case($1, "success")/egi;
+ print "$string\n";
This prints:
- this is a SUcCESS case
+ this is a SUcCESS case
As an alternative, to keep the case of the replacement word if it is
longer than the original, you can use this code, by Jeff Pinyan:
- sub preserve_case {
- my ($from, $to) = @_;
- my ($lf, $lt) = map length, @_;
+ sub preserve_case {
+ my ($from, $to) = @_;
+ my ($lf, $lt) = map length, @_;
- if ($lt < $lf) { $from = substr $from, 0, $lt }
- else { $from .= substr $to, $lf }
+ if ($lt < $lf) { $from = substr $from, 0, $lt }
+ else { $from .= substr $to, $lf }
- return uc $to | ($from ^ uc $from);
- }
+ return uc $to | ($from ^ uc $from);
+ }
This changes the sentence to "this is a SUcCess case."
If the substitution has more characters than the string being substituted,
the case of the last character is used for the rest of the substitution.
- # Original by Nathan Torkington, massaged by Jeffrey Friedl
- #
- sub preserve_case($$)
- {
- my ($old, $new) = @_;
- my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc
- my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
- my ($len) = $oldlen < $newlen ? $oldlen : $newlen;
-
- for ($i = 0; $i < $len; $i++) {
- if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
- $state = 0;
- } elsif (lc $c eq $c) {
- substr($new, $i, 1) = lc(substr($new, $i, 1));
- $state = 1;
- } else {
- substr($new, $i, 1) = uc(substr($new, $i, 1));
- $state = 2;
- }
- }
- # finish up with any remaining new (for when new is longer than old)
- if ($newlen > $oldlen) {
- if ($state == 1) {
- substr($new, $oldlen) = lc(substr($new, $oldlen));
- } elsif ($state == 2) {
- substr($new, $oldlen) = uc(substr($new, $oldlen));
- }
- }
- return $new;
- }
+ # Original by Nathan Torkington, massaged by Jeffrey Friedl
+ #
+ sub preserve_case($$)
+ {
+ my ($old, $new) = @_;
+ my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc
+ my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
+ my ($len) = $oldlen < $newlen ? $oldlen : $newlen;
+
+ for ($i = 0; $i < $len; $i++) {
+ if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
+ $state = 0;
+ } elsif (lc $c eq $c) {
+ substr($new, $i, 1) = lc(substr($new, $i, 1));
+ $state = 1;
+ } else {
+ substr($new, $i, 1) = uc(substr($new, $i, 1));
+ $state = 2;
+ }
+ }
+ # finish up with any remaining new (for when new is longer than old)
+ if ($newlen > $oldlen) {
+ if ($state == 1) {
+ substr($new, $oldlen) = lc(substr($new, $oldlen));
+ } elsif ($state == 2) {
+ substr($new, $oldlen) = uc(substr($new, $oldlen));
+ }
+ }
+ return $new;
+ }
=head2 How can I make C<\w> match national character sets?
X<\w>
also that any regex special characters will be acted on unless you
precede the substitution with \Q. Here's an example:
- $string = "Placido P. Octopus";
- $regex = "P.";
+ $string = "Placido P. Octopus";
+ $regex = "P.";
- $string =~ s/$regex/Polyp/;
- # $string is now "Polypacido P. Octopus"
+ $string =~ s/$regex/Polyp/;
+ # $string is now "Polypacido P. Octopus"
Because C<.> is special in regular expressions, and can match any
single character, the regex C<P.> here has matched the <Pl> in the
To escape the special meaning of C<.>, we use C<\Q>:
- $string = "Placido P. Octopus";
- $regex = "P.";
+ $string = "Placido P. Octopus";
+ $regex = "P.";
- $string =~ s/\Q$regex/Polyp/;
- # $string is now "Placido Polyp Octopus"
+ $string =~ s/\Q$regex/Polyp/;
+ # $string is now "Placido Polyp Octopus"
The use of C<\Q> causes the <.> in the regex to be treated as a
regular character, so that C<P.> matches a C<P> followed by a dot.
This example takes a regular expression from the argument list and
prints the lines of input that match it:
- my $pattern = shift @ARGV;
+ my $pattern = shift @ARGV;
- while( <> ) {
- print if m/$pattern/;
- }
+ while( <> ) {
+ print if m/$pattern/;
+ }
Versions of Perl prior to 5.6 would recompile the regular expression
for each iteration, even if C<$pattern> had not changed. The C</o>
would prevent this by telling Perl to compile the pattern the first
time, then reuse that for subsequent iterations:
- my $pattern = shift @ARGV;
+ my $pattern = shift @ARGV;
- while( <> ) {
- print if m/$pattern/o; # useful for Perl < 5.6
- }
+ while( <> ) {
+ print if m/$pattern/o; # useful for Perl < 5.6
+ }
In versions 5.6 and later, Perl won't recompile the regular expression
if the variable hasn't changed, so you probably don't need the C</o>
compiling the regular expression on each iteration. With Perl 5.6 or
later, you should only see C<re> report that for the first iteration.
- use re 'debug';
+ use re 'debug';
- $regex = 'Perl';
- foreach ( qw(Perl Java Ruby Python) ) {
- print STDERR "-" x 73, "\n";
- print STDERR "Trying $_...\n";
- print STDERR "\t$_ is good!\n" if m/$regex/;
- }
+ $regex = 'Perl';
+ foreach ( qw(Perl Java Ruby Python) ) {
+ print STDERR "-" x 73, "\n";
+ print STDERR "Trying $_...\n";
+ print STDERR "\t$_ is good!\n" if m/$regex/;
+ }
=head2 How do I use a regular expression to strip C-style comments from a file?
While this actually can be done, it's much harder than you'd think.
For example, this one-liner
- perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
+ perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
will work in many but not all cases. You see, it's too simple-minded for
certain kinds of C programs, in particular, those with what appear to be
comments in quoted strings. For that, you'd need something like this,
created by Jeffrey Friedl and later modified by Fred Curtis.
- $/ = undef;
- $_ = <>;
- s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
- print;
+ $/ = undef;
+ $_ = <>;
+ s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
+ print;
This could, of course, be more legibly written with the C</x> modifier, adding
whitespace and comments. Here it is expanded, courtesy of Fred Curtis.
(contributed by brian d foy)
-Your first try should probably be the C<Text::Balanced> module, which
+Your first try should probably be the L<Text::Balanced> module, which
is in the Perl standard library since Perl 5.8. It has a variety of
-functions to deal with tricky text. The C<Regexp::Common> module can
+functions to deal with tricky text. The L<Regexp::Common> module can
also help by providing canned patterns you can use.
As of Perl 5.10, you can match balanced text with regular expressions
group with one level of nesting and a group with two levels of
nesting. There are five total groups in angle brackets:
- I have some <brackets in <nested brackets> > and
- <another group <nested once <nested twice> > >
- and that's it.
+ I have some <brackets in <nested brackets> > and
+ <another group <nested once <nested twice> > >
+ and that's it.
The regular expression to match the balanced text uses two new (to
Perl 5.10) regular expression features. These are covered in L<perlre>
Putting it all together, you have:
- #!/usr/local/bin/perl5.10.0
-
- my $string =<<"HERE";
- I have some <brackets in <nested brackets> > and
- <another group <nested once <nested twice> > >
- and that's it.
- HERE
-
- my @groups = $string =~ m/
- ( # start of capture group 1
- < # match an opening angle bracket
- (?:
- [^<>]++ # one or more non angle brackets, non backtracking
- |
- (?1) # found < or >, so recurse to capture group 1
- )*
- > # match a closing angle bracket
- ) # end of capture group 1
- /xg;
-
- $" = "\n\t";
- print "Found:\n\t@groups\n";
+ #!/usr/local/bin/perl5.10.0
+
+ my $string =<<"HERE";
+ I have some <brackets in <nested brackets> > and
+ <another group <nested once <nested twice> > >
+ and that's it.
+ HERE
+
+ my @groups = $string =~ m/
+ ( # start of capture group 1
+ < # match an opening angle bracket
+ (?:
+ [^<>]++ # one or more non angle brackets, non backtracking
+ |
+ (?1) # found < or >, so recurse to capture group 1
+ )*
+ > # match a closing angle bracket
+ ) # end of capture group 1
+ /xg;
+
+ $" = "\n\t";
+ print "Found:\n\t@groups\n";
The output shows that Perl found the two major groups:
- Found:
- <brackets in <nested brackets> >
- <another group <nested once <nested twice> > >
+ Found:
+ <brackets in <nested brackets> >
+ <another group <nested once <nested twice> > >
With a little extra work, you can get the all of the groups in angle
brackets even if they are in other angle brackets too. Each time you
just matched so don't match it again) and add it to a queue of strings
to process. Keep doing that until you get no matches:
- #!/usr/local/bin/perl5.10.0
+ #!/usr/local/bin/perl5.10.0
- my @queue =<<"HERE";
- I have some <brackets in <nested brackets> > and
- <another group <nested once <nested twice> > >
- and that's it.
- HERE
+ my @queue =<<"HERE";
+ I have some <brackets in <nested brackets> > and
+ <another group <nested once <nested twice> > >
+ and that's it.
+ HERE
- my $regex = qr/
- ( # start of bracket 1
- < # match an opening angle bracket
- (?:
- [^<>]++ # one or more non angle brackets, non backtracking
- |
- (?1) # recurse to bracket 1
- )*
- > # match a closing angle bracket
- ) # end of bracket 1
- /x;
+ my $regex = qr/
+ ( # start of bracket 1
+ < # match an opening angle bracket
+ (?:
+ [^<>]++ # one or more non angle brackets, non backtracking
+ |
+ (?1) # recurse to bracket 1
+ )*
+ > # match a closing angle bracket
+ ) # end of bracket 1
+ /x;
- $" = "\n\t";
+ $" = "\n\t";
- while( @queue )
- {
- my $string = shift @queue;
+ while( @queue ) {
+ my $string = shift @queue;
- my @groups = $string =~ m/$regex/g;
- print "Found:\n\t@groups\n\n" if @groups;
+ my @groups = $string =~ m/$regex/g;
+ print "Found:\n\t@groups\n\n" if @groups;
- unshift @queue, map { s/^<//; s/>$//; $_ } @groups;
- }
+ unshift @queue, map { s/^<//; s/>$//; $_ } @groups;
+ }
The output shows all of the groups. The outermost matches show up
first and the nested matches so up later:
- Found:
- <brackets in <nested brackets> >
- <another group <nested once <nested twice> > >
+ Found:
+ <brackets in <nested brackets> >
+ <another group <nested once <nested twice> > >
- Found:
- <nested brackets>
+ Found:
+ <nested brackets>
- Found:
- <nested once <nested twice> >
+ Found:
+ <nested once <nested twice> >
- Found:
- <nested twice>
+ Found:
+ <nested twice>
=head2 What does it mean that regexes are greedy? How can I get around it?
X<greedy> X<greediness>
An example:
- $s1 = $s2 = "I am very very cold";
- $s1 =~ s/ve.*y //; # I am cold
- $s2 =~ s/ve.*?y //; # I am very cold
+ $s1 = $s2 = "I am very very cold";
+ $s1 =~ s/ve.*y //; # I am cold
+ $s2 =~ s/ve.*?y //; # I am very cold
Notice how the second substitution stopped matching as soon as it
encountered "y ". The C<*?> quantifier effectively tells the regular
Use the split function:
- while (<>) {
- foreach $word ( split ) {
- # do something with $word here
- }
- }
+ while (<>) {
+ foreach $word ( split ) {
+ # do something with $word here
+ }
+ }
Note that this isn't really a word in the English sense; it's just
chunks of consecutive non-whitespace characters.
To work with only alphanumeric sequences (including underscores), you
might consider
- while (<>) {
- foreach $word (m/(\w+)/g) {
- # do something with $word here
- }
- }
+ while (<>) {
+ foreach $word (m/(\w+)/g) {
+ # do something with $word here
+ }
+ }
=head2 How can I print out a word-frequency or line-frequency summary?
apostrophes, rather than the non-whitespace chunk idea of a word given
in the previous question:
- while (<>) {
- while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
- $seen{$1}++;
- }
- }
+ while (<>) {
+ while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
+ $seen{$1}++;
+ }
+ }
- while ( ($word, $count) = each %seen ) {
- print "$count $word\n";
- }
+ while ( ($word, $count) = each %seen ) {
+ print "$count $word\n";
+ }
If you wanted to do the same thing for lines, you wouldn't need a
regular expression:
- while (<>) {
- $seen{$_}++;
- }
+ while (<>) {
+ $seen{$_}++;
+ }
- while ( ($line, $count) = each %seen ) {
- print "$count $line";
- }
+ while ( ($line, $count) = each %seen ) {
+ print "$count $line";
+ }
If you want these output in a sorted order, see L<perlfaq4>: "How do I
sort a hash (optionally by value instead of key)?".
If you have Perl 5.10 or later, this is almost trivial. You just smart
match against an array of regular expression objects:
- my @patterns = ( qr/Fr.d/, qr/B.rn.y/, qr/W.lm./ );
+ my @patterns = ( qr/Fr.d/, qr/B.rn.y/, qr/W.lm./ );
- if( $string ~~ @patterns ) {
- ...
- };
+ if( $string ~~ @patterns ) {
+ ...
+ };
The smart match stops when it finds a match, so it doesn't have to try
every expression.
iteration of the C<foreach> loop since it has no way to know what
C<$pattern> will be:
- my @patterns = qw( foo bar baz );
+ my @patterns = qw( foo bar baz );
- LINE: while( <DATA> ) {
- foreach $pattern ( @patterns ) {
- if( /\b$pattern\b/i ) {
- print;
- next LINE;
- }
- }
- }
+ LINE: while( <DATA> ) {
+ foreach $pattern ( @patterns ) {
+ if( /\b$pattern\b/i ) {
+ print;
+ next LINE;
+ }
+ }
+ }
The C<qr//> operator showed up in perl 5.005. It compiles a regular
expression, but doesn't apply it. When you use the pre-compiled
a C<map> to turn each pattern into its pre-compiled form. The rest of
the script is the same, but faster:
- my @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
+ my @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
- LINE: while( <> ) {
- foreach $pattern ( @patterns ) {
- if( /$pattern/ )
- {
- print;
- next LINE;
- }
- }
- }
+ LINE: while( <> ) {
+ foreach $pattern ( @patterns ) {
+ if( /$pattern/ ) {
+ print;
+ next LINE;
+ }
+ }
+ }
In some cases, you may be able to make several patterns into a single
regular expression. Beware of situations that require backtracking
though.
- my $regex = join '|', qw( foo bar baz );
+ my $regex = join '|', qw( foo bar baz );
- LINE: while( <> ) {
- print if /\b(?:$regex)\b/i;
- }
+ LINE: while( <> ) {
+ print if /\b(?:$regex)\b/i;
+ }
For more details on regular expression efficiency, see I<Mastering
Regular Expressions> by Jeffrey Friedl. He explains how regular
than a word character precedes the "P" and succeeds the "l", the
pattern will match. These strings match /\bPerl\b/.
- "Perl" # no word char before P or after l
- "Perl " # same as previous (space is not a word char)
- "'Perl'" # the ' char is not a word char
- "Perl's" # no word char before P, non-word char after "l"
+ "Perl" # no word char before P or after l
+ "Perl " # same as previous (space is not a word char)
+ "'Perl'" # the ' char is not a word char
+ "Perl's" # no word char before P, non-word char after "l"
These strings do not match /\bPerl\b/.
- "Perl_" # _ is a word char!
- "Perler" # no word char before P, but one after l
+ "Perl_" # _ is a word char!
+ "Perler" # no word char before P, but one after l
You don't have to use \b to match words though. You can look for
non-word characters surrounded by word characters. These strings
match the pattern /\b'\b/.
- "don't" # the ' char is surrounded by "n" and "t"
- "qep'a'" # the ' char is surrounded by "p" and "a"
+ "don't" # the ' char is surrounded by "n" and "t"
+ "qep'a'" # the ' char is surrounded by "p" and "a"
These strings do not match /\b'\b/.
- "foo'" # there is no word char after non-word '
+ "foo'" # there is no word char after non-word '
You can also use the complement of \b, \B, to specify that there
should not be a word boundary.
In the pattern /\Bam\B/, there must be a word character before the "a"
and after the "m". These patterns match /\Bam\B/:
- "llama" # "am" surrounded by word chars
- "Samuel" # same
+ "llama" # "am" surrounded by word chars
+ "Samuel" # same
These strings do not match /\Bam\B/
- "Sam" # no word boundary before "a", but one after "m"
- "I am Sam" # "am" surrounded by non-word chars
+ "Sam" # no word boundary before "a", but one after "m"
+ "I am Sam" # "am" surrounded by non-word chars
=head2 Why does using $&, $`, or $' slow my program down?
to stop at C<a>. Simply matching pairs of digits skips over
the C<a> and still matches C<44>.
- $_ = "1122a44";
- my @pairs = m/(\d\d)/g; # qw( 11 22 44 )
+ $_ = "1122a44";
+ my @pairs = m/(\d\d)/g; # qw( 11 22 44 )
If you use the C<\G> anchor, you force the match after C<22> to
start with the C<a>. The regular expression cannot match
fails and the match operator returns the pairs it already
found.
- $_ = "1122a44";
- my @pairs = m/\G(\d\d)/g; # qw( 11 22 )
+ $_ = "1122a44";
+ my @pairs = m/\G(\d\d)/g; # qw( 11 22 )
You can also use the C<\G> anchor in scalar context. You
still need the C<g> flag.
- $_ = "1122a44";
- while( m/\G(\d\d)/g )
- {
- print "Found $1\n";
- }
+ $_ = "1122a44";
+ while( m/\G(\d\d)/g ) {
+ print "Found $1\n";
+ }
After the match fails at the letter C<a>, perl resets C<pos()>
and the next match on the same string starts at the beginning.
- $_ = "1122a44";
- while( m/\G(\d\d)/g )
- {
- print "Found $1\n";
- }
+ $_ = "1122a44";
+ while( m/\G(\d\d)/g ) {
+ print "Found $1\n";
+ }
- print "Found $1 after while" if m/(\d\d)/g; # finds "11"
+ print "Found $1 after while" if m/(\d\d)/g; # finds "11"
You can disable C<pos()> resets on fail with the C<c> flag, documented
in L<perlop> and L<perlreref>. Subsequent matches start where the last
and since it does not use any anchor it can skip over the C<a> to find
C<44>.
- $_ = "1122a44";
- while( m/\G(\d\d)/gc )
- {
- print "Found $1\n";
- }
+ $_ = "1122a44";
+ while( m/\G(\d\d)/gc ) {
+ print "Found $1\n";
+ }
- print "Found $1 after while" if m/(\d\d)/g; # finds "44"
+ print "Found $1 after while" if m/(\d\d)/g; # finds "44"
Typically you use the C<\G> anchor with the C<c> flag
when you want to try a different match if one fails,
such as in a tokenizer. Jeffrey Friedl offers this example
which works in 5.004 or later.
- while (<>) {
- chomp;
- PARSER: {
- m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
- m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
- m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
- m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
- }
- }
+ while (<>) {
+ chomp;
+ PARSER: {
+ m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
+ m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
+ m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
+ m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
+ }
+ }
For each line, the C<PARSER> loop first tries to match a series
of digits followed by a word boundary. This match has to
and L<Encode>.
If you are stuck with older Perls, you can do Unicode with the
-C<Unicode::String> module, and character conversions using the
-C<Unicode::Map8> and C<Unicode::Map> modules. If you are using
+L<Unicode::String> module, and character conversions using the
+L<Unicode::Map8> and L<Unicode::Map> modules. If you are using
Japanese encodings, you might try using the jperl 5.005_03.
Finally, the following set of approaches was offered by Jeffrey
Here are a few ways, all painful, to deal with it:
- # Make sure adjacent "martian" bytes are no longer adjacent.
- $martian =~ s/([A-Z][A-Z])/ $1 /g;
+ # Make sure adjacent "martian" bytes are no longer adjacent.
+ $martian =~ s/([A-Z][A-Z])/ $1 /g;
- print "found GX!\n" if $martian =~ /GX/;
+ print "found GX!\n" if $martian =~ /GX/;
Or like this:
- @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g;
- # above is conceptually similar to: @chars = $text =~ m/(.)/g;
- #
- foreach $char (@chars) {
- print "found GX!\n", last if $char eq 'GX';
- }
+ @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g;
+ # above is conceptually similar to: @chars = $text =~ m/(.)/g;
+ #
+ foreach $char (@chars) {
+ print "found GX!\n", last if $char eq 'GX';
+ }
Or like this:
- while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded
- print "found GX!\n", last if $1 eq 'GX';
- }
+ while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded
+ print "found GX!\n", last if $1 eq 'GX';
+ }
Here's another, slightly less painful, way to do it from Benjamin
Goldberg, who uses a zero-width negative look-behind assertion.
- print "found GX!\n" if $martian =~ m/
- (?<![A-Z])
- (?:[A-Z][A-Z])*?
- GX
- /x;
+ print "found GX!\n" if $martian =~ m/
+ (?<![A-Z])
+ (?:[A-Z][A-Z])*?
+ GX
+ /x;
This succeeds if the "martian" character GX is in the string, and fails
otherwise. If you don't like using (?<!), a zero-width negative
Once you have the pattern in C<$regex>, you use that variable in the
match operator.
- chomp( my $regex = <STDIN> );
+ chomp( my $regex = <STDIN> );
- if( $string =~ m/$regex/ ) { ... }
+ if( $string =~ m/$regex/ ) { ... }
Any regular expression special characters in C<$regex> are still
special, and the pattern still has to be valid or Perl will complain.
For instance, in this pattern there is an unpaired parenthesis.
- my $regex = "Unmatched ( paren";
+ my $regex = "Unmatched ( paren";
- "Two parens to bind them all" =~ m/$regex/;
+ "Two parens to bind them all" =~ m/$regex/;
When Perl compiles the regular expression, it treats the parenthesis
as the start of a memory match. When it doesn't find the closing
parenthesis, it complains:
- Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE paren/ at script line 3.
+ Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE paren/ at script line 3.
You can get around this in several ways depending on our situation.
First, if you don't want any of the characters in the string to be
special, you can escape them with C<quotemeta> before you use the string.
- chomp( my $regex = <STDIN> );
- $regex = quotemeta( $regex );
+ chomp( my $regex = <STDIN> );
+ $regex = quotemeta( $regex );
- if( $string =~ m/$regex/ ) { ... }
+ if( $string =~ m/$regex/ ) { ... }
You can also do this directly in the match operator using the C<\Q>
and C<\E> sequences. The C<\Q> tells Perl where to start escaping
special characters, and the C<\E> tells it where to stop (see L<perlop>
for more details).
- chomp( my $regex = <STDIN> );
+ chomp( my $regex = <STDIN> );
- if( $string =~ m/\Q$regex\E/ ) { ... }
+ if( $string =~ m/\Q$regex\E/ ) { ... }
Alternately, you can use C<qr//>, the regular expression quote operator (see
L<perlop> for more details). It quotes and perhaps compiles the pattern,
and you can apply regular expression flags to the pattern.
- chomp( my $input = <STDIN> );
+ chomp( my $input = <STDIN> );
- my $regex = qr/$input/is;
+ my $regex = qr/$input/is;
- $string =~ m/$regex/ # same as m/$input/is;
+ $string =~ m/$regex/ # same as m/$input/is;
You might also want to trap any errors by wrapping an C<eval> block
around the whole thing.
- chomp( my $input = <STDIN> );
+ chomp( my $input = <STDIN> );
- eval {
- if( $string =~ m/\Q$input\E/ ) { ... }
- };
- warn $@ if $@;
+ eval {
+ if( $string =~ m/\Q$input\E/ ) { ... }
+ };
+ warn $@ if $@;
Or...
- my $regex = eval { qr/$input/is };
- if( defined $regex ) {
- $string =~ m/$regex/;
- }
- else {
- warn $@;
- }
+ my $regex = eval { qr/$input/is };
+ if( defined $regex ) {
+ $string =~ m/$regex/;
+ }
+ else {
+ warn $@;
+ }
=head1 AUTHOR AND COPYRIGHT
They are type specifiers, as detailed in L<perldata>:
- $ for scalar values (number, string or reference)
- @ for arrays
- % for hashes (associative arrays)
- & for subroutines (aka functions, procedures, methods)
- * for all types of that symbol name. In version 4 you used them like
- pointers, but in modern perls you can just use references.
+ $ for scalar values (number, string or reference)
+ @ for arrays
+ % for hashes (associative arrays)
+ & for subroutines (aka functions, procedures, methods)
+ * for all types of that symbol name. In version 4 you used them like
+ pointers, but in modern perls you can just use references.
There are a couple of other symbols that
you're likely to encounter that aren't
really type specifiers:
- <> are used for inputting a record from a filehandle.
- \ takes a reference to something.
+ <> are used for inputting a record from a filehandle.
+ \ takes a reference to something.
Note that <FILE> is I<neither> the type specifier for files
nor the name of the handle. It is the C<< <> >> operator applied
operand to the C<< => >> operator both
count as though they were quoted:
- This is like this
- ------------ ---------------
- $foo{line} $foo{'line'}
- bar => stuff 'bar' => stuff
+ This is like this
+ ------------ ---------------
+ $foo{line} $foo{'line'}
+ bar => stuff 'bar' => stuff
The final semicolon in a block is optional, as is the final comma in a
list. Good style (see L<perlstyle>) says to put them in except for
one-liners:
- if ($whoops) { exit 1 }
- @nums = (1, 2, 3);
+ if ($whoops) { exit 1 }
+ @nums = (1, 2, 3);
- if ($whoops) {
- exit 1;
- }
+ if ($whoops) {
+ exit 1;
+ }
- @lines = (
- "There Beren came from mountains cold",
- "And lost he wandered under leaves",
- );
+ @lines = (
+ "There Beren came from mountains cold",
+ "And lost he wandered under leaves",
+ );
=head2 How do I skip some return values?
One way is to treat the return values as a list and index into it:
- $dir = (getpwnam($user))[7];
+ $dir = (getpwnam($user))[7];
Another way is to use undef as an element on the left-hand-side:
- ($dev, $ino, undef, undef, $uid, $gid) = stat($file);
+ ($dev, $ino, undef, undef, $uid, $gid) = stat($file);
You can also use a list slice to select only the elements that
you need:
- ($dev, $ino, $uid, $gid) = ( stat($file) )[0,1,4,5];
+ ($dev, $ino, $uid, $gid) = ( stat($file) )[0,1,4,5];
=head2 How do I temporarily block warnings?
allows fine control of what warning are produced.
See L<perllexwarn> for more details.
- {
- no warnings; # temporarily turn off warnings
- $a = $b + $c; # I know these might be undef
- }
+ {
+ no warnings; # temporarily turn off warnings
+ $a = $b + $c; # I know these might be undef
+ }
Additionally, you can enable and disable categories of warnings.
You turn off the categories you want to ignore and you can still
get other categories of warnings. See L<perllexwarn> for the
complete details, including the category names and hierarchy.
- {
- no warnings 'uninitialized';
- $a = $b + $c;
- }
+ {
+ no warnings 'uninitialized';
+ $a = $b + $c;
+ }
If you have an older version of Perl, the C<$^W> variable (documented
in L<perlvar>) controls runtime warnings for a block:
- {
- local $^W = 0; # temporarily turn off warnings
- $a = $b + $c; # I know these might be undef
- }
+ {
+ local $^W = 0; # temporarily turn off warnings
+ $a = $b + $c; # I know these might be undef
+ }
Note that like all the punctuation variables, you cannot currently
use my() on C<$^W>, only local().
A common mistake is to write:
- unlink $file || die "snafu";
+ unlink $file || die "snafu";
This gets interpreted as:
- unlink ($file || die "snafu");
+ unlink ($file || die "snafu");
To avoid this problem, either put in extra parentheses or use the
super low precedence C<or> operator:
- (unlink $file) || die "snafu";
- unlink $file or die "snafu";
+ (unlink $file) || die "snafu";
+ unlink $file or die "snafu";
The "English" operators (C<and>, C<or>, C<xor>, and C<not>)
deliberately have precedence lower than that of list operators for
produces an lvalue. This assigns $x to either $a or $b, depending
on the trueness of $maybe:
- ($maybe ? $a : $b) = $x;
+ ($maybe ? $a : $b) = $x;
=head2 How do I declare/create a structure?
anonymous) hash reference. See L<perlref> and L<perldsc> for details.
Here's an example:
- $person = {}; # new anonymous hash
- $person->{AGE} = 24; # set field AGE to 24
- $person->{NAME} = "Nat"; # set field NAME to "Nat"
+ $person = {}; # new anonymous hash
+ $person->{AGE} = 24; # set field AGE to 24
+ $person->{NAME} = "Nat"; # set field NAME to "Nat"
If you're looking for something a bit more rigorous, try L<perltoot>.
Here's a classic non-closure function-generating function:
- sub add_function_generator {
- return sub { shift() + shift() };
- }
+ sub add_function_generator {
+ return sub { shift() + shift() };
+ }
- $add_sub = add_function_generator();
- $sum = $add_sub->(4,5); # $sum is 9 now.
+ $add_sub = add_function_generator();
+ $sum = $add_sub->(4,5); # $sum is 9 now.
The anonymous subroutine returned by add_function_generator() isn't
technically a closure because it refers to no lexicals outside its own
that Perl return a proper closure, thus locking in for all time the
value that the lexical had when the function was created.
- sub make_adder {
- my $addpiece = shift;
- return sub { shift() + $addpiece };
- }
+ sub make_adder {
+ my $addpiece = shift;
+ return sub { shift() + $addpiece };
+ }
- $f1 = make_adder(20);
- $f2 = make_adder(555);
+ $f1 = make_adder(20);
+ $f2 = make_adder(555);
Now C<&$f1($n)> is always 20 plus whatever $n you pass in, whereas
C<&$f2($n)> is always 555 plus whatever $n you pass in. The $addpiece
Closures are often used for less esoteric purposes. For example, when
you want to pass in a bit of code into a function:
- my $line;
- timeout( 30, sub { $line = <STDIN> } );
+ my $line;
+ timeout( 30, sub { $line = <STDIN> } );
If the code to execute had been passed in as a string,
C<< '$line = <STDIN>' >>, there would have been no way for the
sure a variable doesn't get meddled with during the lifetime of the
package:
- BEGIN {
- my $id = 0;
- sub next_id { ++$id }
- }
+ BEGIN {
+ my $id = 0;
+ sub next_id { ++$id }
+ }
This is discussed in more detail in L<perlsub>; see the entry on
I<Persistent Private Variables>.
and subroutine arguments. It used to be easy to inadvertently lose a
variable's value this way, but now it's much harder. Take this code:
- my $f = 'foo';
- sub T {
- while ($i++ < 3) { my $f = $f; $f .= "bar"; print $f, "\n" }
- }
+ my $f = 'foo';
+ sub T {
+ while ($i++ < 3) { my $f = $f; $f .= "bar"; print $f, "\n" }
+ }
- T;
- print "Finally $f\n";
+ T;
+ print "Finally $f\n";
If you are experiencing variable suicide, that C<my $f> in the subroutine
doesn't pick up a fresh copy of the C<$f> whose value is <foo>. The output
shows that inside the subroutine the value of C<$f> leaks through when it
shouldn't, as in this output:
- foobar
- foobarbar
- foobarbarbar
- Finally foo
+ foobar
+ foobarbar
+ foobarbarbar
+ Finally foo
The $f that has "bar" added to it three times should be a new C<$f>
C<my $f> should create a new lexical variable each time through the loop.
The expected output is:
- foobar
- foobar
- foobar
- Finally foo
+ foobar
+ foobar
+ foobar
+ Finally foo
=head2 How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regex}?
Regular variables and functions are quite easy to pass: just pass in a
reference to an existing or anonymous variable or function:
- func( \$some_scalar );
+ func( \$some_scalar );
- func( \@some_array );
- func( [ 1 .. 10 ] );
+ func( \@some_array );
+ func( [ 1 .. 10 ] );
- func( \%some_hash );
- func( { this => 10, that => 20 } );
+ func( \%some_hash );
+ func( { this => 10, that => 20 } );
- func( \&some_func );
- func( sub { $_[0] ** $_[1] } );
+ func( \&some_func );
+ func( sub { $_[0] ** $_[1] } );
=item Passing Filehandles
As of Perl 5.6, you can represent filehandles with scalar variables
which you treat as any other scalar.
- open my $fh, $filename or die "Cannot open $filename! $!";
- func( $fh );
+ open my $fh, $filename or die "Cannot open $filename! $!";
+ func( $fh );
- sub func {
- my $passed_fh = shift;
+ sub func {
+ my $passed_fh = shift;
- my $line = <$passed_fh>;
- }
+ my $line = <$passed_fh>;
+ }
Before Perl 5.6, you had to use the C<*FH> or C<\*FH> notations.
These are "typeglobs"--see L<perldata/"Typeglobs and Filehandles">
for it to match against. You construct the pattern with the C<qr//>
operator:
- sub compare($$) {
- my ($val1, $regex) = @_;
- my $retval = $val1 =~ /$regex/;
- return $retval;
- }
- $match = compare("old McDonald", qr/d.*D/i);
+ sub compare($$) {
+ my ($val1, $regex) = @_;
+ my $retval = $val1 =~ /$regex/;
+ return $retval;
+ }
+ $match = compare("old McDonald", qr/d.*D/i);
=item Passing Methods
To pass an object method into a subroutine, you can do this:
- call_a_lot(10, $some_obj, "methname")
- sub call_a_lot {
- my ($count, $widget, $trick) = @_;
- for (my $i = 0; $i < $count; $i++) {
- $widget->$trick();
- }
- }
+ call_a_lot(10, $some_obj, "methname")
+ sub call_a_lot {
+ my ($count, $widget, $trick) = @_;
+ for (my $i = 0; $i < $count; $i++) {
+ $widget->$trick();
+ }
+ }
Or, you can use a closure to bundle up the object, its
method call, and arguments:
- my $whatnot = sub { $some_obj->obfuscate(@args) };
- func($whatnot);
- sub func {
- my $code = shift;
- &$code();
- }
+ my $whatnot = sub { $some_obj->obfuscate(@args) };
+ func($whatnot);
+ sub func {
+ my $code = shift;
+ &$code();
+ }
You could also investigate the can() method in the UNIVERSAL class
(part of the standard perl distribution).
declaration creates the lexical variable that persists between calls
to the subroutine:
- sub counter { state $count = 1; $counter++ }
+ sub counter { state $count = 1; $counter++ }
You can fake a static variable by using a lexical variable which goes
out of scope. In this example, you define the subroutine C<counter>, and
The data in chunk of memory defined by C<$count> is private to
C<counter>.
- BEGIN {
- my $count = 1;
- sub counter { $count++ }
- }
+ BEGIN {
+ my $count = 1;
+ sub counter { $count++ }
+ }
- my $start = counter();
+ my $start = counter();
- .... # code that calls counter();
+ .... # code that calls counter();
- my $end = counter();
+ my $end = counter();
In the previous example, you created a function-private variable
because only one function remembered its reference. You could define
They can both access C<$count>, and since it has gone out of scope,
there is no other way to access it.
- BEGIN {
- my $count = 1;
- sub increment_count { $count++ }
- sub return_count { $count }
- }
+ BEGIN {
+ my $count = 1;
+ sub increment_count { $count++ }
+ sub return_count { $count }
+ }
To declare a file-private variable, you still use a lexical variable.
A file is also a scope, so a lexical variable defined in the file
For instance:
- sub visible {
- print "var has value $var\n";
- }
+ sub visible {
+ print "var has value $var\n";
+ }
- sub dynamic {
- local $var = 'local'; # new temporary value for the still-global
- visible(); # variable called $var
- }
+ sub dynamic {
+ local $var = 'local'; # new temporary value for the still-global
+ visible(); # variable called $var
+ }
- sub lexical {
- my $var = 'private'; # new private variable, $var
- visible(); # (invisible outside of sub scope)
- }
+ sub lexical {
+ my $var = 'private'; # new private variable, $var
+ visible(); # (invisible outside of sub scope)
+ }
- $var = 'global';
+ $var = 'global';
- visible(); # prints global
- dynamic(); # prints local
- lexical(); # prints global
+ visible(); # prints global
+ dynamic(); # prints local
+ lexical(); # prints global
Notice how at no point does the value "private" get printed. That's
because $var only has that value within the block of the lexical()
in the current package, but rather the one in the "main" package, as
though you had written $main::var.
- use vars '$var';
- local $var = "global";
- my $var = "lexical";
+ use vars '$var';
+ local $var = "global";
+ my $var = "lexical";
- print "lexical is $var\n";
- print "global is $main::var\n";
+ print "lexical is $var\n";
+ print "global is $main::var\n";
Alternatively you can use the compiler directive our() to bring a
dynamic variable into the current lexical scope.
- require 5.006; # our() did not exist before 5.6
- use vars '$var';
+ require 5.006; # our() did not exist before 5.6
+ use vars '$var';
- local $var = "global";
- my $var = "lexical";
+ local $var = "global";
+ my $var = "lexical";
- print "lexical is $var\n";
+ print "lexical is $var\n";
- {
- our $var;
- print "global is $var\n";
- }
+ {
+ our $var;
+ print "global is $var\n";
+ }
=head2 What's the difference between deep and shallow binding?
To enforce scalar context in this particular case, however, you need
merely omit the parentheses:
- local($foo) = <FILE>; # WRONG
- local($foo) = scalar(<FILE>); # ok
- local $foo = <FILE>; # right
+ local($foo) = <FILE>; # WRONG
+ local($foo) = scalar(<FILE>); # ok
+ local $foo = <FILE>; # right
You should probably be using lexical variables anyway, although the
issue is the same here:
- my($foo) = <FILE>; # WRONG
- my $foo = <FILE>; # right
+ my($foo) = <FILE>; # WRONG
+ my $foo = <FILE>; # right
=head2 How do I redefine a builtin function, operator, or method?
list, C<@_>. Here's an example; the C<bar> subroutine calls C<&foo>,
which prints its arguments list:
- sub bar { &foo }
+ sub bar { &foo }
- sub foo { print "Args in foo are: @_\n" }
+ sub foo { print "Args in foo are: @_\n" }
- bar( qw( a b c ) );
+ bar( qw( a b c ) );
When you call C<bar> with arguments, you see that C<foo> got the same C<@_>:
- Args in foo are: a b c
+ Args in foo are: a b c
Calling the subroutine with trailing parentheses, with or without arguments,
does not use the current C<@_> and respects the subroutine prototype. Changing
the example to put parentheses after the call to C<foo> changes the program:
- sub bar { &foo() }
+ sub bar { &foo() }
- sub foo { print "Args in foo are: @_\n" }
+ sub foo { print "Args in foo are: @_\n" }
- bar( qw( a b c ) );
+ bar( qw( a b c ) );
Now the output shows that C<foo> doesn't get the C<@_> from its caller.
- Args in foo are:
+ Args in foo are:
The main use of the C<@_> pass-through feature is to write subroutines
whose main job it is to call other subroutines for you. For further
In Perl 5.10, use the C<given-when> construct described in L<perlsyn>:
- use 5.010;
+ use 5.010;
- given ( $string ) {
- when( 'Fred' ) { say "I found Fred!" }
- when( 'Barney' ) { say "I found Barney!" }
- when( /Bamm-?Bamm/ ) { say "I found Bamm-Bamm!" }
- default { say "I don't recognize the name!" }
- };
+ given ( $string ) {
+ when( 'Fred' ) { say "I found Fred!" }
+ when( 'Barney' ) { say "I found Barney!" }
+ when( /Bamm-?Bamm/ ) { say "I found Bamm-Bamm!" }
+ default { say "I don't recognize the name!" }
+ };
If one wants to use pure Perl and to be compatible with Perl versions
prior to 5.10, the general answer is to use C<if-elsif-else>:
- for ($variable_to_test) {
- if (/pat1/) { } # do something
- elsif (/pat2/) { } # do something else
- elsif (/pat3/) { } # do something else
- else { } # default
- }
+ for ($variable_to_test) {
+ if (/pat1/) { } # do something
+ elsif (/pat2/) { } # do something else
+ elsif (/pat3/) { } # do something else
+ else { } # default
+ }
Here's a simple example of a switch based on pattern matching,
lined up in a way to make it look more like a switch statement.
SWITCH: for (ref $whatchamacallit) {
- /^$/ && die "not a reference";
+ /^$/ && die "not a reference";
- /SCALAR/ && do {
- print_scalar($$ref);
- last SWITCH;
- };
+ /SCALAR/ && do {
+ print_scalar($$ref);
+ last SWITCH;
+ };
- /ARRAY/ && do {
- print_array(@$ref);
- last SWITCH;
- };
+ /ARRAY/ && do {
+ print_array(@$ref);
+ last SWITCH;
+ };
- /HASH/ && do {
- print_hash(%$ref);
- last SWITCH;
- };
+ /HASH/ && do {
+ print_hash(%$ref);
+ last SWITCH;
+ };
- /CODE/ && do {
- warn "can't print function ref";
- last SWITCH;
- };
+ /CODE/ && do {
+ warn "can't print function ref";
+ last SWITCH;
+ };
- # DEFAULT
+ # DEFAULT
- warn "User defined type skipped";
+ warn "User defined type skipped";
}
one takes precedence over another, as C<"SEND"> has precedence over
C<"STOP"> here:
- chomp($answer = <>);
- if ("SEND" =~ /^\Q$answer/i) { print "Action is send\n" }
- elsif ("STOP" =~ /^\Q$answer/i) { print "Action is stop\n" }
- elsif ("ABORT" =~ /^\Q$answer/i) { print "Action is abort\n" }
- elsif ("LIST" =~ /^\Q$answer/i) { print "Action is list\n" }
- elsif ("EDIT" =~ /^\Q$answer/i) { print "Action is edit\n" }
+ chomp($answer = <>);
+ if ("SEND" =~ /^\Q$answer/i) { print "Action is send\n" }
+ elsif ("STOP" =~ /^\Q$answer/i) { print "Action is stop\n" }
+ elsif ("ABORT" =~ /^\Q$answer/i) { print "Action is abort\n" }
+ elsif ("LIST" =~ /^\Q$answer/i) { print "Action is list\n" }
+ elsif ("EDIT" =~ /^\Q$answer/i) { print "Action is edit\n" }
A totally different approach is to create a hash of function references.
- my %commands = (
- "happy" => \&joy,
- "sad", => \&sullen,
- "done" => sub { die "See ya!" },
- "mad" => \&angry,
- );
-
- print "How are you? ";
- chomp($string = <STDIN>);
- if ($commands{$string}) {
- $commands{$string}->();
- } else {
- print "No such command: $string\n";
- }
+ my %commands = (
+ "happy" => \&joy,
+ "sad", => \&sullen,
+ "done" => sub { die "See ya!" },
+ "mad" => \&angry,
+ );
+
+ print "How are you? ";
+ chomp($string = <STDIN>);
+ if ($commands{$string}) {
+ $commands{$string}->();
+ } else {
+ print "No such command: $string\n";
+ }
Starting from Perl 5.8, a source filter module, C<Switch>, can also be
used to get switch and case. Its use is now discouraged, because it's
When it comes to undefined variables that would trigger a warning
under C<use warnings>, you can promote the warning to an error.
- use warnings FATAL => qw(uninitialized);
+ use warnings FATAL => qw(uninitialized);
=head2 Why can't a method included in this same file be found?
special literals as separate tokens, so you can't interpolate them
into strings like you can with variables:
- my $current_package = __PACKAGE__;
- print "I am in package $current_package\n";
+ my $current_package = __PACKAGE__;
+ print "I am in package $current_package\n";
If you want to find the package calling your code, perhaps to give better
-diagnostics as C<Carp> does, use the C<caller> built-in:
+diagnostics as L<Carp> does, use the C<caller> built-in:
- sub foo {
- my @args = ...;
- my( $package, $filename, $line ) = caller;
+ sub foo {
+ my @args = ...;
+ my( $package, $filename, $line ) = caller;
- print "I was called from package $package\n";
- );
+ print "I was called from package $package\n";
+ );
By default, your program starts in package C<main>, so you will
always be in some package.
This is different from finding out the package an object is blessed
into, which might not be the current package. For that, use C<blessed>
-from C<Scalar::Util>, part of the Standard Library since Perl 5.8:
+from L<Scalar::Util>, part of the Standard Library since Perl 5.8:
- use Scalar::Util qw(blessed);
- my $object_package = blessed( $object );
+ use Scalar::Util qw(blessed);
+ my $object_package = blessed( $object );
Most of the time, you shouldn't care what package an object is blessed
into, however, as long as it claims to inherit from that class:
- my $is_right_class = eval { $object->isa( $package ) }; # true or false
+ my $is_right_class = eval { $object->isa( $package ) }; # true or false
And, with Perl 5.10 and later, you don't have to check for an
inheritance to see if the object can handle a role. For that, you can
use C<DOES>, which comes from C<UNIVERSAL>:
- my $class_does_it = eval { $object->DOES( $role ) }; # true or false
+ my $class_does_it = eval { $object->DOES( $role ) }; # true or false
You can safely replace C<isa> with C<DOES> (although the converse is not true).
expects a new statement (so not in the middle of statements like the #
comments). You end the comment with C<=cut>, ending the Pod section:
- =pod
+ =pod
- my $object = NotGonnaHappen->new();
+ my $object = NotGonnaHappen->new();
- ignored_sub();
+ ignored_sub();
- $wont_be_assigned = 37;
+ $wont_be_assigned = 37;
- =cut
+ =cut
The quick-and-dirty method only works well when you don't plan to
leave the commented code in the source. If a Pod parser comes along,
same label. You still need the C<=cut> to go back to Perl code from
the Pod comment:
- =begin comment
+ =begin comment
- my $object = NotGonnaHappen->new();
+ my $object = NotGonnaHappen->new();
- ignored_sub();
+ ignored_sub();
- $wont_be_assigned = 37;
+ $wont_be_assigned = 37;
- =end comment
+ =end comment
- =cut
+ =cut
For more information on Pod, check out L<perlpod> and L<perlpodspec>.
Use this code, provided by Mark-Jason Dominus:
- sub scrub_package {
- no strict 'refs';
- my $pack = shift;
- die "Shouldn't delete main package"
- if $pack eq "" || $pack eq "main";
- my $stash = *{$pack . '::'}{HASH};
- my $name;
- foreach $name (keys %$stash) {
- my $fullname = $pack . '::' . $name;
- # Get rid of everything with that name.
- undef $$fullname;
- undef @$fullname;
- undef %$fullname;
- undef &$fullname;
- undef *$fullname;
- }
- }
+ sub scrub_package {
+ no strict 'refs';
+ my $pack = shift;
+ die "Shouldn't delete main package"
+ if $pack eq "" || $pack eq "main";
+ my $stash = *{$pack . '::'}{HASH};
+ my $name;
+ foreach $name (keys %$stash) {
+ my $fullname = $pack . '::' . $name;
+ # Get rid of everything with that name.
+ undef $$fullname;
+ undef @$fullname;
+ undef %$fullname;
+ undef &$fullname;
+ undef *$fullname;
+ }
+ }
Or, if you're using a recent release of Perl, you can
just use the Symbol::delete_package() function instead.
Beginners often think they want to have a variable contain the name
of a variable.
- $fred = 23;
- $varname = "fred";
- ++$$varname; # $fred now 24
+ $fred = 23;
+ $varname = "fred";
+ ++$$varname; # $fred now 24
This works I<sometimes>, but it is a very bad idea for two reasons.
(like C<%main::>) instead of a user-defined hash. The solution is to
use your own hash or a real reference instead.
- $USER_VARS{"fred"} = 23;
- $varname = "fred";
- $USER_VARS{$varname}++; # not $$varname++
+ $USER_VARS{"fred"} = 23;
+ $varname = "fred";
+ $USER_VARS{$varname}++; # not $$varname++
There we're using the %USER_VARS hash instead of symbolic references.
Sometimes this comes up in reading strings from the user with variable
reading a string and expanding it to the actual contents of your program's
own variables:
- $str = 'this has a $fred and $barney in it';
- $str =~ s/(\$\w+)/$1/eeg; # need double eval
+ $str = 'this has a $fred and $barney in it';
+ $str =~ s/(\$\w+)/$1/eeg; # need double eval
it would be better to keep a hash around like %USER_VARS and have
variable references actually refer to entries in that hash:
- $str =~ s/\$(\w+)/$USER_VARS{$1}/g; # no /e here at all
+ $str =~ s/\$(\w+)/$USER_VARS{$1}/g; # no /e here at all
That's faster, cleaner, and safer than the previous approach. Of course,
you don't need to use a dollar sign. You could use your own scheme to
make it less confusing, like bracketed percent symbols, etc.
- $str = 'this has a %fred% and %barney% in it';
- $str =~ s/%(\w+)%/$USER_VARS{$1}/g; # no /e here at all
+ $str = 'this has a %fred% and %barney% in it';
+ $str =~ s/%(\w+)%/$USER_VARS{$1}/g; # no /e here at all
Another reason that folks sometimes think they want a variable to
contain the name of a variable is that they don't know how to build
wanted two hashes in their program: %fred and %barney, and that they
wanted to use another scalar variable to refer to those by name.
- $name = "fred";
- $$name{WIFE} = "wilma"; # set %fred
+ $name = "fred";
+ $$name{WIFE} = "wilma"; # set %fred
- $name = "barney";
- $$name{WIFE} = "betty"; # set %barney
+ $name = "barney";
+ $$name{WIFE} = "betty"; # set %barney
This is still a symbolic reference, and is still saddled with the
problems enumerated above. It would be far better to write:
- $folks{"fred"}{WIFE} = "wilma";
- $folks{"barney"}{WIFE} = "betty";
+ $folks{"fred"}{WIFE} = "wilma";
+ $folks{"barney"}{WIFE} = "betty";
And just use a multilevel hash to start with.
In those cases, you would turn off C<strict 'refs'> temporarily so you
can play around with the symbol table. For example:
- @colors = qw(red blue green yellow orange purple violet);
- for my $name (@colors) {
- no strict 'refs'; # renege for the block
- *$name = sub { "<FONT COLOR='$name'>@_</FONT>" };
- }
+ @colors = qw(red blue green yellow orange purple violet);
+ for my $name (@colors) {
+ no strict 'refs'; # renege for the block
+ *$name = sub { "<FONT COLOR='$name'>@_</FONT>" };
+ }
All those functions (red(), blue(), green(), etc.) appear to be separate,
but the real code in the closure actually was compiled only once.
In either case, you should still be able to run the scripts with perl
explicitly:
- % perl script.pl
+ % perl script.pl
If you get a message like "perl: command not found", perl is not in
your PATH, which might also mean that the location of perl is not
=item Keyboard
- Term::Cap Standard perl distribution
- Term::ReadKey CPAN
- Term::ReadLine::Gnu CPAN
- Term::ReadLine::Perl CPAN
- Term::Screen CPAN
+ Term::Cap Standard perl distribution
+ Term::ReadKey CPAN
+ Term::ReadLine::Gnu CPAN
+ Term::ReadLine::Perl CPAN
+ Term::Screen CPAN
=item Screen
- Term::Cap Standard perl distribution
- Curses CPAN
- Term::ANSIColor CPAN
+ Term::Cap Standard perl distribution
+ Curses CPAN
+ Term::ANSIColor CPAN
=item Mouse
- Tk CPAN
+ Tk CPAN
=back
In general, you don't, because you don't know whether
the recipient has a color-aware display device. If you
know that they have an ANSI terminal that understands
-color, you can use the C<Term::ANSIColor> module from CPAN:
+color, you can use the L<Term::ANSIColor> module from CPAN:
- use Term::ANSIColor;
- print color("red"), "Stop!\n", color("reset");
- print color("green"), "Go!\n", color("reset");
+ use Term::ANSIColor;
+ print color("red"), "Stop!\n", color("reset");
+ print color("green"), "Go!\n", color("reset");
Or like this:
- use Term::ANSIColor qw(:constants);
- print RED, "Stop!\n", RESET;
- print GREEN, "Go!\n", RESET;
+ use Term::ANSIColor qw(:constants);
+ print RED, "Stop!\n", RESET;
+ print GREEN, "Go!\n", RESET;
=head2 How do I read just one key without waiting for a return key?
L<perlfunc/getc>, but as you see, that's already getting you into
portability snags.
- open(TTY, "+</dev/tty") or die "no tty: $!";
- system "stty cbreak </dev/tty >/dev/tty 2>&1";
- $key = getc(TTY); # perhaps this works
- # OR ELSE
- sysread(TTY, $key, 1); # probably this does
- system "stty -cbreak </dev/tty >/dev/tty 2>&1";
+ open(TTY, "+</dev/tty") or die "no tty: $!";
+ system "stty cbreak </dev/tty >/dev/tty 2>&1";
+ $key = getc(TTY); # perhaps this works
+ # OR ELSE
+ sysread(TTY, $key, 1); # probably this does
+ system "stty -cbreak </dev/tty >/dev/tty 2>&1";
-The C<Term::ReadKey> module from CPAN offers an easy-to-use interface that
+The L<Term::ReadKey> module from CPAN offers an easy-to-use interface that
should be more efficient than shelling out to B<stty> for each key.
It even includes limited support for Windows.
- use Term::ReadKey;
- ReadMode('cbreak');
- $key = ReadKey(0);
- ReadMode('normal');
+ use Term::ReadKey;
+ ReadMode('cbreak');
+ $key = ReadKey(0);
+ ReadMode('normal');
However, using the code requires that you have a working C compiler
and can use it to build and install a CPAN module. Here's a solution
-using the standard C<POSIX> module, which is already on your system
+using the standard L<POSIX> module, which is already on your system
(assuming your system supports POSIX).
- use HotKey;
- $key = readkey();
+ use HotKey;
+ $key = readkey();
And here's the C<HotKey> module, which hides the somewhat mystifying calls
to manipulate the POSIX termios structures.
- # HotKey.pm
- package HotKey;
+ # Hot