7 This section deals with questions related to networking, the internet,
10 =head2 What is the correct form of response from a CGI script?
12 (Alan Flavell <flavell+www@a5.ph.gla.ac.uk> answers...)
14 The Common Gateway Interface (CGI) specifies a software interface between
15 a program ("CGI script") and a web server (HTTPD). It is not specific
16 to Perl, and has its own FAQs and tutorials, and usenet group,
17 comp.infosystems.www.authoring.cgi
19 The CGI specification is outlined in an informational RFC:
20 http://www.ietf.org/rfc/rfc3875
22 These Perl FAQs very selectively cover some CGI issues. However, Perl
23 programmers are strongly advised to use the C<CGI.pm> module, to take care
24 of the details for them.
26 The similarity between CGI response headers (defined in the CGI
27 specification) and HTTP response headers (defined in the HTTP
28 specification, RFC2616) is intentional, but can sometimes be confusing.
30 The CGI specification defines two kinds of script: the "Parsed Header"
31 script, and the "Non Parsed Header" (NPH) script. Check your server
32 documentation to see what it supports. "Parsed Header" scripts are
33 simpler in various respects. The CGI specification allows any of the
34 usual newline representations in the CGI response (it's the server's
35 job to create an accurate HTTP response based on it). So "\n" written in
36 text mode is technically correct, and recommended. NPH scripts are more
37 tricky: they must put out a complete and accurate set of HTTP
38 transaction response headers; the HTTP specification calls for records
39 to be terminated with carriage-return and line-feed, i.e ASCII \015\012
40 written in binary mode.
42 Using C<CGI.pm> gives excellent platform independence, including EBCDIC
43 systems. C<CGI.pm> selects an appropriate newline representation
44 (C<$CGI::CRLF>) and sets binmode as appropriate.
46 =head2 My CGI script runs from the command line but not the browser. (500 Server Error)
48 (contributed by brian d foy)
50 There are many things that might be wrong with your CGI program, and only
51 some of them might be related to Perl. Try going through the troubleshooting
54 http://www.perlmonks.org/?node_id=380424
56 =head2 How can I get better error messages from a CGI program?
58 Use the C<CGI::Carp> module. It replaces C<warn> and C<die>, plus the
59 normal C<Carp> modules C<carp>, C<croak>, and C<confess> functions with
60 more verbose and safer versions. It still sends them to the normal
64 warn "This is a complaint";
65 die "But this one is serious";
67 The following use of C<CGI::Carp> also redirects errors to a file of your choice,
68 placed in a C<BEGIN> block to catch compile-time warnings as well:
71 use CGI::Carp qw(carpout);
72 open(LOG, ">>/var/local/cgi-logs/mycgi-log")
73 or die "Unable to append to mycgi-log: $!\n";
77 You can even arrange for fatal errors to go back to the client browser,
78 which is nice for your own debugging, but might confuse the end user.
80 use CGI::Carp qw(fatalsToBrowser);
83 Even if the error happens before you get the HTTP header out, the module
84 will try to take care of this to avoid the dreaded server 500 errors.
85 Normal warnings still go out to the server error log (or wherever
86 you've sent them with C<carpout>) with the application name and date
89 =head2 How do I remove HTML from a string?
91 The most correct way (albeit not the fastest) is to use C<HTML::Parser>
92 from CPAN. Another mostly correct
93 way is to use C<HTML::FormatText> which not only removes HTML but also
94 attempts to do a little simple formatting of the resulting plain text.
96 Many folks attempt a simple-minded regular expression approach, like
97 C<< s/<.*?>//g >>, but that fails in many cases because the tags
98 may continue over line breaks, they may contain quoted angle-brackets,
99 or HTML comment may be present. Plus, folks forget to convert
100 entities--like C<<> for example.
102 Here's one "simple-minded" approach, that works for most files:
104 #!/usr/bin/perl -p0777
105 s/<(?:[^>'"]*|(['"]).*?\g1)*>//gs
107 If you want a more complete solution, see the 3-stage striphtml
109 http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz
112 Here are some tricky cases that you should think about when picking
115 <IMG SRC = "foo.gif" ALT = "A > B">
122 <script>if (a<b && a>c)</script>
126 <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
128 If HTML comments include other tags, those solutions would also break
131 <!-- This section commented out.
132 <B>You can't see me!</B>
135 =head2 How do I extract URLs?
137 You can easily extract all sorts of URLs from HTML with
138 C<HTML::SimpleLinkExtor> which handles anchors, images, objects,
139 frames, and many other tags that can contain a URL. If you need
140 anything more complex, you can create your own subclass of
141 C<HTML::LinkExtor> or C<HTML::Parser>. You might even use
142 C<HTML::SimpleLinkExtor> as an example for something specifically
143 suited to your needs.
145 You can use C<URI::Find> to extract URLs from an arbitrary text document.
147 Less complete solutions involving regular expressions can save
148 you a lot of processing time if you know that the input is simple. One
149 solution from Tom Christiansen runs 100 times faster than most
150 module based approaches but only extracts URLs from anchors where the first
151 attribute is HREF and there are no other attributes.
154 # qxurl - tchrist@perl.com
155 print "$2\n" while m{
157 A \s+ HREF \s* = \s* (["']) (.*?) \g1
161 =head2 How do I download a file from the user's machine? How do I open a file on another machine?
163 In this case, download means to use the file upload feature of HTML
164 forms. You allow the web surfer to specify a file to send to your web
165 server. To you it looks like a download, and to the user it looks
166 like an upload. No matter what you call it, you do it with what's
167 known as B<multipart/form-data> encoding. The C<CGI.pm> module (which
168 comes with Perl as part of the Standard Library) supports this in the
169 C<start_multipart_form()> method, which isn't the same as the C<startform()>
172 See the section in the C<CGI.pm> documentation on file uploads for code
173 examples and details.
175 =head2 How do I make an HTML pop-up menu with Perl?
177 (contributed by brian d foy)
179 The C<CGI.pm> module (which comes with Perl) has functions to create
180 the HTML form widgets. See the C<CGI.pm> documentation for more
183 use CGI qw/:standard/;
185 start_html('Favorite Animals'),
188 "What's your favorite animal? ",
191 -values => [ qw( Llama Alpaca Camel Ram ) ]
198 =head2 How do I fetch an HTML file?
200 (contributed by brian d foy)
202 Use the libwww-perl distribution. The C<LWP::Simple> module can fetch web
203 resources and give their content back to you as a string:
205 use LWP::Simple qw(get);
207 my $html = get( "http://www.example.com/index.html" );
209 It can also store the resource directly in a file:
211 use LWP::Simple qw(getstore);
213 getstore( "http://www.example.com/index.html", "foo.html" );
215 If you need to do something more complicated, you can use
216 C<LWP::UserAgent> module to create your own user-agent (e.g. browser)
217 to get the job done. If you want to simulate an interactive web
218 browser, you can use the C<WWW::Mechanize> module.
220 =head2 How do I automate an HTML form submission?
222 If you are doing something complex, such as moving through many pages
223 and forms or a web site, you can use C<WWW::Mechanize>. See its
224 documentation for all the details.
226 If you're submitting values using the GET method, create a URL and encode
227 the form using the C<query_form> method:
232 my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
233 $url->query_form(module => 'DB_File', readme => 1);
234 $content = get($url);
236 If you're using the POST method, create your own user agent and encode
237 the content appropriately.
239 use HTTP::Request::Common qw(POST);
242 $ua = LWP::UserAgent->new();
243 my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
244 [ module => 'DB_File', readme => 1 ];
245 $content = $ua->request($req)->as_string;
247 =head2 How do I decode or create those %-encodings on the web?
248 X<URI> X<CGI.pm> X<CGI> X<URI::Escape> X<RFC 2396>
250 (contributed by brian d foy)
252 Those C<%> encodings handle reserved characters in URIs, as described
253 in RFC 2396, Section 2. This encoding replaces the reserved character
254 with the hexadecimal representation of the character's number from
255 the US-ASCII table. For instance, a colon, C<:>, becomes C<%3A>.
257 In CGI scripts, you don't have to worry about decoding URIs if you are
258 using C<CGI.pm>. You shouldn't have to process the URI yourself,
259 either on the way in or the way out.
261 If you have to encode a string yourself, remember that you should
262 never try to encode an already-composed URI. You need to escape the
263 components separately then put them together. To encode a string, you
264 can use the C<URI::Escape> module. The C<uri_escape> function
265 returns the escaped string:
267 my $original = "Colon : Hash # Percent %";
269 my $escaped = uri_escape( $original );
271 print "$escaped\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25'
273 To decode the string, use the C<uri_unescape> function:
275 my $unescaped = uri_unescape( $escaped );
277 print $unescaped; # back to original
279 If you wanted to do it yourself, you simply need to replace the
280 reserved characters with their encodings. A global substitution
284 $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
287 $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;
289 =head2 How do I redirect to another page?
291 Specify the complete URL of the destination (even if it is on the same
292 server). This is one of the two different kinds of CGI "Location:"
293 responses which are defined in the CGI specification for a Parsed Headers
294 script. The other kind (an absolute URLpath) is resolved internally to
295 the server without any HTTP redirection. The CGI specifications do not
296 allow relative URLs in either case.
298 Use of C<CGI.pm> is strongly recommended. This example shows redirection
299 with a complete URL. This redirection is handled by the web browser.
301 use CGI qw/:standard/;
303 my $url = 'http://www.cpan.org/';
304 print redirect($url);
306 This example shows a redirection with an absolute URLpath. This
307 redirection is handled by the local web server.
309 my $url = '/CPAN/index.html';
310 print redirect($url);
312 But if coded directly, it could be as follows (the final "\n" is
313 shown separately, for clarity), using either a complete URL or
316 print "Location: $url\n"; # CGI response header
317 print "\n"; # end of headers
319 =head2 How do I put a password on my web pages?
321 To enable authentication for your web server, you need to configure
322 your web server. The configuration is different for different sorts
323 of web servers--apache does it differently from iPlanet which does
324 it differently from IIS. Check your web server documentation for
325 the details for your particular server.
327 =head2 How do I edit my .htpasswd and .htgroup files with Perl?
329 The C<HTTPD::UserAdmin> and C<HTTPD::GroupAdmin> modules provide a
330 consistent OO interface to these files, regardless of how they're
331 stored. Databases may be text, dbm, Berkeley DB or any database with
332 a DBI compatible driver. C<HTTPD::UserAdmin> supports files used by the
333 "Basic" and "Digest" authentication schemes. Here's an example:
335 use HTTPD::UserAdmin ();
337 ->new(DB => "/foo/.htpasswd")
338 ->add($username => $password);
340 =head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things?
342 (contributed by brian d foy)
344 You can't really prevent people from sending your script bad data, at
345 least not with Perl, which works on the server side. If you want to
346 prevent data that try to use SQL injection or other sorts of attacks
347 (and you should want to), you have to not trust any data that enter
350 The L<perlsec> documentation has general advice about data security.
351 If you are using the C<DBI> module, use placeholder to fill in data.
352 If you are running external programs with C<system> or C<exec>, use
353 the list forms. There are many other precautions that you should take,
354 too many to list here, and most of them fall under the category of not
355 using any data that you don't intend to use. Trust no one.
357 =head2 How do I parse a mail header?
359 For a quick-and-dirty solution, try this solution derived
360 from L<perlfunc/split>:
364 $header =~ s/\n\s+/ /g; # merge continuation lines
365 %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
367 That solution doesn't do well if, for example, you're trying to
368 maintain all the Received lines. A more complete approach is to use
369 the C<Mail::Header> module from CPAN (part of the C<MailTools> package).
371 =head2 How do I decode a CGI form?
373 (contributed by brian d foy)
375 Use the C<CGI.pm> module that comes with Perl. It's quick,
376 it's easy, and it actually does quite a bit of work to
377 ensure things happen correctly. It handles GET, POST, and
378 HEAD requests, multipart forms, multivalued fields, query
379 string and message body combinations, and many other things
380 you probably don't want to think about.
382 It doesn't get much easier: the C<CGI.pm> module automatically
383 parses the input and makes each value available through the
386 use CGI qw(:standard);
388 my $total = param( 'price' ) + param( 'shipping' );
390 my @items = param( 'item' ); # multiple values, same field name
392 If you want an object-oriented approach, C<CGI.pm> can do that too.
396 my $cgi = CGI->new();
398 my $total = $cgi->param( 'price' ) + $cgi->param( 'shipping' );
400 my @items = $cgi->param( 'item' );
402 You might also try C<CGI::Minimal> which is a lightweight version
403 of the same thing. Other CGI::* modules on CPAN might work better
406 Many people try to write their own decoder (or copy one from
407 another program) and then run into one of the many "gotchas"
408 of the task. It's much easier and less hassle to use C<CGI.pm>.
410 =head2 How do I check a valid mail address?
412 (partly contributed by Aaron Sherman)
414 This isn't as simple a question as it sounds. There are two parts:
416 a) How do I verify that an email address is correctly formatted?
418 b) How do I verify that an email address targets a valid recipient?
420 Without sending mail to the address and seeing whether there's a human
421 on the other end to answer you, you cannot fully answer part I<b>, but
422 either the C<Email::Valid> or the C<RFC::RFC822::Address> module will do
423 both part I<a> and part I<b> as far as you can in real-time.
425 If you want to just check part I<a> to see that the address is valid
426 according to the mail header standard with a simple regular expression,
427 you can have problems, because there are deliverable addresses that
428 aren't RFC-2822 (the latest mail header standard) compliant, and
429 addresses that aren't deliverable which, are compliant. However, the
430 following will match valid RFC-2822 addresses that do not have comments,
431 folding whitespace, or any other obsolete or non-essential elements.
432 This I<just> matches the address itself:
434 my $atom = qr{[a-zA-Z0-9_!#\$\%&'*+/=?\^`{}~|\-]+};
435 my $dot_atom = qr{$atom(?:\.$atom)*};
436 my $quoted = qr{"(?:\\[^\r\n]|[^\\"])*"};
437 my $local = qr{(?:$dot_atom|$quoted)};
438 my $quotedpair = qr{\\[\x00-\x09\x0B-\x0c\x0e-\x7e]};
439 my $domain_lit = qr{\[(?:$quotedpair|[\x21-\x5a\x5e-\x7e])*\]};
440 my $domain = qr{(?:$dot_atom|$domain_lit)};
441 my $addr_spec = qr{$local\@$domain};
443 Just match an address against C</^${addr_spec}$/> to see if it follows
444 the RFC2822 specification. However, because it is impossible to be
445 sure that such a correctly formed address is actually the correct way
446 to reach a particular person or even has a mailbox associated with it,
447 you must be very careful about how you use this.
449 Our best advice for verifying a person's mail address is to have them
450 enter their address twice, just as you normally do to change a
451 password. This usually weeds out typos. If both versions match, send
452 mail to that address with a personal message. If you get the message
453 back and they've followed your directions, you can be reasonably
454 assured that it's real.
456 A related strategy that's less open to forgery is to give them a PIN
457 (personal ID number). Record the address and PIN (best that it be a
458 random one) for later processing. In the mail you send, ask them to
459 include the PIN in their reply. But if it bounces, or the message is
460 included via a "vacation" script, it'll be there anyway. So it's
461 best to ask them to mail back a slight alteration of the PIN, such as
462 with the characters reversed, one added or subtracted to each digit, etc.
464 =head2 How do I decode a MIME/BASE64 string?
466 The C<MIME-Base64> package (available from CPAN) handles this as well as
467 the MIME/QP encoding. Decoding BASE64 becomes as simple as:
470 $decoded = decode_base64($encoded);
472 The C<MIME-Tools> package (available from CPAN) supports extraction with
473 decoding of BASE64 encoded attachments and content directly from email
476 If the string to decode is short (less than 84 bytes long)
477 a more direct approach is to use the C<unpack()> function's "u"
478 format after minor transliterations:
480 tr#A-Za-z0-9+/##cd; # remove non-base64 chars
481 tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
482 $len = pack("c", 32 + 0.75*length); # compute length byte
483 print unpack("u", $len . $_); # uudecode and print
485 =head2 How do I return the user's mail address?
487 On systems that support getpwuid, the C<< $< >> variable, and the
488 C<Sys::Hostname> module (which is part of the standard perl distribution),
489 you can probably try using something like this:
492 $address = sprintf('%s@%s', scalar getpwuid($<), hostname);
494 Company policies on mail address can mean that this generates addresses
495 that the company's mail system will not accept, so you should ask for
496 users' mail addresses when this matters. Furthermore, not all systems
497 on which Perl runs are so forthcoming with this information as is Unix.
499 The C<Mail::Util> module from CPAN (part of the C<MailTools> package) provides a
500 C<mailaddress()> function that tries to guess the mail address of the user.
501 It makes a more intelligent guess than the code above, using information
502 given when the module was installed, but it could still be incorrect.
503 Again, the best way is often just to ask the user.
505 =head2 How do I send mail?
507 Use the C<sendmail> program directly:
509 open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
510 or die "Can't fork for sendmail: $!\n";
511 print SENDMAIL <<"EOF";
512 From: User Originating Mail <me\@host>
513 To: Final Destination <you\@otherhost>
514 Subject: A relevant subject line
516 Body of the message goes here after the blank line
517 in as many lines as you like.
519 close(SENDMAIL) or warn "sendmail didn't close nicely";
521 The B<-oi> option prevents C<sendmail> from interpreting a line consisting
522 of a single dot as "end of message". The B<-t> option says to use the
523 headers to decide who to send the message to, and B<-odq> says to put
524 the message into the queue. This last option means your message won't
525 be immediately delivered, so leave it out if you want immediate
528 Alternate, less convenient approaches include calling C<mail> (sometimes
529 called C<mailx>) directly or simply opening up port 25 have having an
530 intimate conversation between just you and the remote SMTP daemon,
531 probably C<sendmail>.
533 Or you might be able use the CPAN module C<Mail::Mailer>:
537 $mailer = Mail::Mailer->new();
538 $mailer->open({ From => $from_address,
542 or die "Can't open: $!\n";
546 The C<Mail::Internet> module uses C<Net::SMTP> which is less Unix-centric than
547 C<Mail::Mailer>, but less reliable. Avoid raw SMTP commands. There
548 are many reasons to use a mail transport agent like C<sendmail>. These
549 include queuing, MX records, and security.
551 =head2 How do I use MIME to make an attachment to a mail message?
553 This answer is extracted directly from the C<MIME::Lite> documentation.
554 Create a multipart message (i.e., one with attachments).
558 ### Create a new multipart message:
559 $msg = MIME::Lite->new(
560 From =>'me@myhost.com',
561 To =>'you@yourhost.com',
562 Cc =>'some@other.com, some@more.com',
563 Subject =>'A message with 2 parts...',
564 Type =>'multipart/mixed'
567 ### Add parts (each "attach" has same arguments as "new"):
568 $msg->attach(Type =>'TEXT',
569 Data =>"Here's the GIF file you wanted"
571 $msg->attach(Type =>'image/gif',
572 Path =>'aaa000123.gif',
573 Filename =>'logo.gif'
576 $text = $msg->as_string;
578 C<MIME::Lite> also includes a method for sending these things.
582 This defaults to using L<sendmail> but can be customized to use
583 SMTP via L<Net::SMTP>.
585 =head2 How do I read mail?
587 While you could use the C<Mail::Folder> module from CPAN (part of the
588 C<MailFolder> package) or the C<Mail::Internet> module from CPAN (part
589 of the C<MailTools> package), often a module is overkill. Here's a
596 $/ = ''; # paragraph reads
599 /^Subject:\s*(?:Re:\s*)*(.*)/mi;
600 $sub[++$msgno] = lc($1) || '';
604 for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
611 # bysub2 - awkish sort-by-subject
612 BEGIN { $msgno = -1 }
613 $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
615 END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
617 =head2 How do I find out my hostname, domainname, or IP address?
618 X<hostname, domainname, IP address, host, domain, hostfqdn, inet_ntoa,
619 gethostbyname, Socket, Net::Domain, Sys::Hostname>
621 (contributed by brian d foy)
623 The C<Net::Domain> module, which is part of the standard distribution starting
624 in perl5.7.3, can get you the fully qualified domain name (FQDN), the host
625 name, or the domain name.
627 use Net::Domain qw(hostname hostfqdn hostdomain);
629 my $host = hostfqdn();
631 The C<Sys::Hostname> module, included in the standard distribution since
632 perl5.6, can also get the hostname.
638 To get the IP address, you can use the C<gethostbyname> built-in function
639 to turn the name into a number. To turn that number into the dotted octet
640 form (a.b.c.d) that most people expect, use the C<inet_ntoa> function
641 from the C<Socket> module, which also comes with perl.
645 my $address = inet_ntoa(
646 scalar gethostbyname( $host || 'localhost' )
649 =head2 How do I fetch a news article or the active newsgroups?
651 Use the C<Net::NNTP> or C<News::NNTPClient> modules, both available from CPAN.
652 This can make tasks like fetching the newsgroup list as simple as
654 perl -MNews::NNTPClient
655 -e 'print News::NNTPClient->new->list("newsgroups")'
657 =head2 How do I fetch/put an FTP file?
659 (contributed by brian d foy)
661 The C<LWP> family of modules (available on CPAN as the libwww-perl distibution)
662 can work with FTP just like it can with many other protocols. C<LWP::Simple>
663 makes it quite easy to fetch a file:
667 my $data = get( 'ftp://some.ftp.site/some/file.txt' );
669 If you want more direct or low-level control of the FTP process, you can use
670 the C<Net::FTP> module (in the Standard Library since Perl 5.8). It's
671 documentation has examples showing you just how to do that.
673 =head2 How can I do RPC in Perl?
675 (contributed by brian d foy)
677 Use one of the RPC modules you can find on CPAN (
678 http://search.cpan.org/search?query=RPC&mode=all ).
680 =head1 AUTHOR AND COPYRIGHT
682 Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
683 other authors as noted. All rights reserved.
685 This documentation is free; you can redistribute it and/or modify it
686 under the same terms as Perl itself.
688 Irrespective of its distribution, all code examples in this file
689 are hereby placed into the public domain. You are permitted and
690 encouraged to use this code in your own programs for fun
691 or for profit as you see fit. A simple comment in the code giving
692 credit would be courteous but is not required.