7 This section deals with questions related to networking, the internet,
10 =head2 What is the correct form of response from a CGI script?
12 (Alan Flavell <flavell+www@a5.ph.gla.ac.uk> answers...)
14 The Common Gateway Interface (CGI) specifies a software interface between
15 a program ("CGI script") and a web server (HTTPD). It is not specific
16 to Perl, and has its own FAQs and tutorials, and usenet group,
17 comp.infosystems.www.authoring.cgi
19 The CGI specification is outlined in an informational RFC:
20 http://www.ietf.org/rfc/rfc3875
22 These Perl FAQs very selectively cover some CGI issues. However, Perl
23 programmers are strongly advised to use the C<CGI.pm> module, to take care
24 of the details for them.
26 The similarity between CGI response headers (defined in the CGI
27 specification) and HTTP response headers (defined in the HTTP
28 specification, RFC2616) is intentional, but can sometimes be confusing.
30 The CGI specification defines two kinds of script: the "Parsed Header"
31 script, and the "Non Parsed Header" (NPH) script. Check your server
32 documentation to see what it supports. "Parsed Header" scripts are
33 simpler in various respects. The CGI specification allows any of the
34 usual newline representations in the CGI response (it's the server's
35 job to create an accurate HTTP response based on it). So "\n" written in
36 text mode is technically correct, and recommended. NPH scripts are more
37 tricky: they must put out a complete and accurate set of HTTP
38 transaction response headers; the HTTP specification calls for records
39 to be terminated with carriage-return and line-feed, i.e ASCII \015\012
40 written in binary mode.
42 Using C<CGI.pm> gives excellent platform independence, including EBCDIC
43 systems. C<CGI.pm> selects an appropriate newline representation
44 (C<$CGI::CRLF>) and sets binmode as appropriate.
46 =head2 My CGI script runs from the command line but not the browser. (500 Server Error)
48 (contributed by brian d foy)
50 There are many things that might be wrong with your CGI program, and only
51 some of them might be related to Perl. Try going through the troubleshooting
54 http://www.perlmonks.org/?node_id=380424
56 =head2 How can I get better error messages from a CGI program?
58 Use the C<CGI::Carp> module. It replaces C<warn> and C<die>, plus the
59 normal C<Carp> modules C<carp>, C<croak>, and C<confess> functions with
60 more verbose and safer versions. It still sends them to the normal
64 warn "This is a complaint";
65 die "But this one is serious";
67 The following use of C<CGI::Carp> also redirects errors to a file of your choice,
68 placed in a C<BEGIN> block to catch compile-time warnings as well:
71 use CGI::Carp qw(carpout);
72 open(LOG, ">>/var/local/cgi-logs/mycgi-log")
73 or die "Unable to append to mycgi-log: $!\n";
77 You can even arrange for fatal errors to go back to the client browser,
78 which is nice for your own debugging, but might confuse the end user.
80 use CGI::Carp qw(fatalsToBrowser);
83 Even if the error happens before you get the HTTP header out, the module
84 will try to take care of this to avoid the dreaded server 500 errors.
85 Normal warnings still go out to the server error log (or wherever
86 you've sent them with C<carpout>) with the application name and date
89 =head2 How do I remove HTML from a string?
91 The most correct way (albeit not the fastest) is to use C<HTML::Parser>
92 from CPAN. Another mostly correct
93 way is to use C<HTML::FormatText> which not only removes HTML but also
94 attempts to do a little simple formatting of the resulting plain text.
96 Many folks attempt a simple-minded regular expression approach, like
97 C<< s/<.*?>//g >>, but that fails in many cases because the tags
98 may continue over line breaks, they may contain quoted angle-brackets,
99 or HTML comment may be present. Plus, folks forget to convert
100 entities--like C<<> for example.
102 Here's one "simple-minded" approach, that works for most files:
104 #!/usr/bin/perl -p0777
105 s/<(?:[^>'"]*|(['"]).*?\g1)*>//gs
107 If you want a more complete solution, see the 3-stage striphtml
109 http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz
112 Here are some tricky cases that you should think about when picking
115 <IMG SRC = "foo.gif" ALT = "A > B">
122 <script>if (a<b && a>c)</script>
126 <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
128 If HTML comments include other tags, those solutions would also break
131 <!-- This section commented out.
132 <B>You can't see me!</B>
135 =head2 How do I extract URLs?
137 You can easily extract all sorts of URLs from HTML with
138 C<HTML::SimpleLinkExtor> which handles anchors, images, objects,
139 frames, and many other tags that can contain a URL. If you need
140 anything more complex, you can create your own subclass of
141 C<HTML::LinkExtor> or C<HTML::Parser>. You might even use
142 C<HTML::SimpleLinkExtor> as an example for something specifically
143 suited to your needs.
145 You can use C<URI::Find> to extract URLs from an arbitrary text document.
147 Less complete solutions involving regular expressions can save
148 you a lot of processing time if you know that the input is simple. One
149 solution from Tom Christiansen runs 100 times faster than most
150 module based approaches but only extracts URLs from anchors where the first
151 attribute is HREF and there are no other attributes.
154 # qxurl - tchrist@perl.com
155 print "$2\n" while m{
157 A \s+ HREF \s* = \s* (["']) (.*?) \g1
161 =head2 How do I download a file from the user's machine? How do I open a file on another machine?
163 In this case, download means to use the file upload feature of HTML
164 forms. You allow the web surfer to specify a file to send to your web
165 server. To you it looks like a download, and to the user it looks
166 like an upload. No matter what you call it, you do it with what's
167 known as B<multipart/form-data> encoding. The C<CGI.pm> module (which
168 comes with Perl as part of the Standard Library) supports this in the
169 C<start_multipart_form()> method, which isn't the same as the C<startform()>
172 See the section in the C<CGI.pm> documentation on file uploads for code
173 examples and details.
175 =head2 How do I make an HTML pop-up menu with Perl?
177 (contributed by brian d foy)
179 The C<CGI.pm> module (which comes with Perl) has functions to create
180 the HTML form widgets. See the C<CGI.pm> documentation for more
183 use CGI qw/:standard/;
185 start_html('Favorite Animals'),
188 "What's your favorite animal? ",
191 -values => [ qw( Llama Alpaca Camel Ram ) ]
198 =head2 How do I fetch an HTML file?
200 (contributed by brian d foy)
202 Use the libwww-perl distribution. The C<LWP::Simple> module can fetch web
203 resources and give their content back to you as a string:
205 use LWP::Simple qw(get);
207 my $html = get( "http://www.example.com/index.html" );
209 It can also store the resource directly in a file:
211 use LWP::Simple qw(getstore);
213 getstore( "http://www.example.com/index.html", "foo.html" );
215 If you need to do something more complicated, you can use
216 C<LWP::UserAgent> module to create your own user-agent (e.g. browser)
217 to get the job done. If you want to simulate an interactive web
218 browser, you can use the C<WWW::Mechanize> module.
220 =head2 How do I automate an HTML form submission?
222 If you are doing something complex, such as moving through many pages
223 and forms or a web site, you can use C<WWW::Mechanize>. See its
224 documentation for all the details.
226 If you're submitting values using the GET method, create a URL and encode
227 the form using the C<query_form> method:
232 my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
233 $url->query_form(module => 'DB_File', readme => 1);
234 $content = get($url);
236 If you're using the POST method, create your own user agent and encode
237 the content appropriately.
239 use HTTP::Request::Common qw(POST);
242 $ua = LWP::UserAgent->new();
243 my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
244 [ module => 'DB_File', readme => 1 ];
245 $content = $ua->request($req)->as_string;
247 =head2 How do I decode or create those %-encodings on the web?
248 X<URI> X<CGI.pm> X<CGI> X<URI::Escape> X<RFC 2396>
250 (contributed by brian d foy)
252 Those C<%> encodings handle reserved characters in URIs, as described
253 in RFC 2396, Section 2. This encoding replaces the reserved character
254 with the hexadecimal representation of the character's number from
255 the US-ASCII table. For instance, a colon, C<:>, becomes C<%3A>.
257 In CGI scripts, you don't have to worry about decoding URIs if you are
258 using C<CGI.pm>. You shouldn't have to process the URI yourself,
259 either on the way in or the way out.
261 If you have to encode a string yourself, remember that you should
262 never try to encode an already-composed URI. You need to escape the
263 components separately then put them together. To encode a string, you
264 can use the C<URI::Escape> module. The C<uri_escape> function
265 returns the escaped string:
267 my $original = "Colon : Hash # Percent %";
269 my $escaped = uri_escape( $original );
271 print "$escaped\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25'
273 To decode the string, use the C<uri_unescape> function:
275 my $unescaped = uri_unescape( $escaped );
277 print $unescaped; # back to original
279 If you wanted to do it yourself, you simply need to replace the
280 reserved characters with their encodings. A global substitution
284 $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
287 $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;
289 =head2 How do I redirect to another page?
291 Specify the complete URL of the destination (even if it is on the same
292 server). This is one of the two different kinds of CGI "Location:"
293 responses which are defined in the CGI specification for a Parsed Headers
294 script. The other kind (an absolute URLpath) is resolved internally to
295 the server without any HTTP redirection. The CGI specifications do not
296 allow relative URLs in either case.
298 Use of C<CGI.pm> is strongly recommended. This example shows redirection
299 with a complete URL. This redirection is handled by the web browser.
301 use CGI qw/:standard/;
303 my $url = 'http://www.cpan.org/';
304 print redirect($url);
306 This example shows a redirection with an absolute URLpath. This
307 redirection is handled by the local web server.
309 my $url = '/CPAN/index.html';
310 print redirect($url);
312 But if coded directly, it could be as follows (the final "\n" is
313 shown separately, for clarity), using either a complete URL or
316 print "Location: $url\n"; # CGI response header
317 print "\n"; # end of headers
319 =head2 How do I put a password on my web pages?
321 To enable authentication for your web server, you need to configure
322 your web server. The configuration is different for different sorts
323 of web servers--apache does it differently from iPlanet which does
324 it differently from IIS. Check your web server documentation for
325 the details for your particular server.
327 =head2 How do I edit my .htpasswd and .htgroup files with Perl?
329 The C<HTTPD::UserAdmin> and C<HTTPD::GroupAdmin> modules provide a
330 consistent OO interface to these files, regardless of how they're
331 stored. Databases may be text, dbm, Berkeley DB or any database with
332 a DBI compatible driver. C<HTTPD::UserAdmin> supports files used by the
333 "Basic" and "Digest" authentication schemes. Here's an example:
335 use HTTPD::UserAdmin ();
337 ->new(DB => "/foo/.htpasswd")
338 ->add($username => $password);
340 =head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things?
342 (contributed by brian d foy)
344 You can't prevent people from sending your script bad data. Even if
345 you add some client-side checks, people may disable them or bypass
346 them completely. For instance, someone might use a module such as
347 C<LWP> to access your CGI program. If you want to prevent data that
348 try to use SQL injection or other sorts of attacks (and you should
349 want to), you have to not trust any data that enter your program.
351 The L<perlsec> documentation has general advice about data security.
352 If you are using the C<DBI> module, use placeholder to fill in data.
353 If you are running external programs with C<system> or C<exec>, use
354 the list forms. There are many other precautions that you should take,
355 too many to list here, and most of them fall under the category of not
356 using any data that you don't intend to use. Trust no one.
358 =head2 How do I parse a mail header?
360 For a quick-and-dirty solution, try this solution derived
361 from L<perlfunc/split>:
365 $header =~ s/\n\s+/ /g; # merge continuation lines
366 %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
368 That solution doesn't do well if, for example, you're trying to
369 maintain all the Received lines. A more complete approach is to use
370 the C<Mail::Header> module from CPAN (part of the C<MailTools> package).
372 =head2 How do I decode a CGI form?
374 (contributed by brian d foy)
376 Use the C<CGI.pm> module that comes with Perl. It's quick,
377 it's easy, and it actually does quite a bit of work to
378 ensure things happen correctly. It handles GET, POST, and
379 HEAD requests, multipart forms, multivalued fields, query
380 string and message body combinations, and many other things
381 you probably don't want to think about.
383 It doesn't get much easier: the C<CGI.pm> module automatically
384 parses the input and makes each value available through the
387 use CGI qw(:standard);
389 my $total = param( 'price' ) + param( 'shipping' );
391 my @items = param( 'item' ); # multiple values, same field name
393 If you want an object-oriented approach, C<CGI.pm> can do that too.
397 my $cgi = CGI->new();
399 my $total = $cgi->param( 'price' ) + $cgi->param( 'shipping' );
401 my @items = $cgi->param( 'item' );
403 You might also try C<CGI::Minimal> which is a lightweight version
404 of the same thing. Other CGI::* modules on CPAN might work better
407 Many people try to write their own decoder (or copy one from
408 another program) and then run into one of the many "gotchas"
409 of the task. It's much easier and less hassle to use C<CGI.pm>.
411 =head2 How do I check a valid mail address?
413 (partly contributed by Aaron Sherman)
415 This isn't as simple a question as it sounds. There are two parts:
417 a) How do I verify that an email address is correctly formatted?
419 b) How do I verify that an email address targets a valid recipient?
421 Without sending mail to the address and seeing whether there's a human
422 on the other end to answer you, you cannot fully answer part I<b>, but
423 either the C<Email::Valid> or the C<RFC::RFC822::Address> module will do
424 both part I<a> and part I<b> as far as you can in real-time.
426 If you want to just check part I<a> to see that the address is valid
427 according to the mail header standard with a simple regular expression,
428 you can have problems, because there are deliverable addresses that
429 aren't RFC-2822 (the latest mail header standard) compliant, and
430 addresses that aren't deliverable which, are compliant. However, the
431 following will match valid RFC-2822 addresses that do not have comments,
432 folding whitespace, or any other obsolete or non-essential elements.
433 This I<just> matches the address itself:
435 my $atom = qr{[a-zA-Z0-9_!#\$\%&'*+/=?\^`{}~|\-]+};
436 my $dot_atom = qr{$atom(?:\.$atom)*};
437 my $quoted = qr{"(?:\\[^\r\n]|[^\\"])*"};
438 my $local = qr{(?:$dot_atom|$quoted)};
439 my $quotedpair = qr{\\[\x00-\x09\x0B-\x0c\x0e-\x7e]};
440 my $domain_lit = qr{\[(?:$quotedpair|[\x21-\x5a\x5e-\x7e])*\]};
441 my $domain = qr{(?:$dot_atom|$domain_lit)};
442 my $addr_spec = qr{$local\@$domain};
444 Just match an address against C</^${addr_spec}$/> to see if it follows
445 the RFC2822 specification. However, because it is impossible to be
446 sure that such a correctly formed address is actually the correct way
447 to reach a particular person or even has a mailbox associated with it,
448 you must be very careful about how you use this.
450 Our best advice for verifying a person's mail address is to have them
451 enter their address twice, just as you normally do to change a
452 password. This usually weeds out typos. If both versions match, send
453 mail to that address with a personal message. If you get the message
454 back and they've followed your directions, you can be reasonably
455 assured that it's real.
457 A related strategy that's less open to forgery is to give them a PIN
458 (personal ID number). Record the address and PIN (best that it be a
459 random one) for later processing. In the mail you send, ask them to
460 include the PIN in their reply. But if it bounces, or the message is
461 included via a "vacation" script, it'll be there anyway. So it's
462 best to ask them to mail back a slight alteration of the PIN, such as
463 with the characters reversed, one added or subtracted to each digit, etc.
465 =head2 How do I decode a MIME/BASE64 string?
467 The C<MIME-Base64> package (available from CPAN) handles this as well as
468 the MIME/QP encoding. Decoding BASE64 becomes as simple as:
471 $decoded = decode_base64($encoded);
473 The C<MIME-Tools> package (available from CPAN) supports extraction with
474 decoding of BASE64 encoded attachments and content directly from email
477 If the string to decode is short (less than 84 bytes long)
478 a more direct approach is to use the C<unpack()> function's "u"
479 format after minor transliterations:
481 tr#A-Za-z0-9+/##cd; # remove non-base64 chars
482 tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
483 $len = pack("c", 32 + 0.75*length); # compute length byte
484 print unpack("u", $len . $_); # uudecode and print
486 =head2 How do I return the user's mail address?
488 On systems that support getpwuid, the C<< $< >> variable, and the
489 C<Sys::Hostname> module (which is part of the standard perl distribution),
490 you can probably try using something like this:
493 $address = sprintf('%s@%s', scalar getpwuid($<), hostname);
495 Company policies on mail address can mean that this generates addresses
496 that the company's mail system will not accept, so you should ask for
497 users' mail addresses when this matters. Furthermore, not all systems
498 on which Perl runs are so forthcoming with this information as is Unix.
500 The C<Mail::Util> module from CPAN (part of the C<MailTools> package) provides a
501 C<mailaddress()> function that tries to guess the mail address of the user.
502 It makes a more intelligent guess than the code above, using information
503 given when the module was installed, but it could still be incorrect.
504 Again, the best way is often just to ask the user.
506 =head2 How do I send mail?
508 Use the C<sendmail> program directly:
510 open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
511 or die "Can't fork for sendmail: $!\n";
512 print SENDMAIL <<"EOF";
513 From: User Originating Mail <me\@host>
514 To: Final Destination <you\@otherhost>
515 Subject: A relevant subject line
517 Body of the message goes here after the blank line
518 in as many lines as you like.
520 close(SENDMAIL) or warn "sendmail didn't close nicely";
522 The B<-oi> option prevents C<sendmail> from interpreting a line consisting
523 of a single dot as "end of message". The B<-t> option says to use the
524 headers to decide who to send the message to, and B<-odq> says to put
525 the message into the queue. This last option means your message won't
526 be immediately delivered, so leave it out if you want immediate
529 Alternate, less convenient approaches include calling C<mail> (sometimes
530 called C<mailx>) directly or simply opening up port 25 have having an
531 intimate conversation between just you and the remote SMTP daemon,
532 probably C<sendmail>.
534 Or you might be able use the CPAN module C<Mail::Mailer>:
538 $mailer = Mail::Mailer->new();
539 $mailer->open({ From => $from_address,
543 or die "Can't open: $!\n";
547 The C<Mail::Internet> module uses C<Net::SMTP> which is less Unix-centric than
548 C<Mail::Mailer>, but less reliable. Avoid raw SMTP commands. There
549 are many reasons to use a mail transport agent like C<sendmail>. These
550 include queuing, MX records, and security.
552 =head2 How do I use MIME to make an attachment to a mail message?
554 This answer is extracted directly from the C<MIME::Lite> documentation.
555 Create a multipart message (i.e., one with attachments).
559 ### Create a new multipart message:
560 $msg = MIME::Lite->new(
561 From =>'me@myhost.com',
562 To =>'you@yourhost.com',
563 Cc =>'some@other.com, some@more.com',
564 Subject =>'A message with 2 parts...',
565 Type =>'multipart/mixed'
568 ### Add parts (each "attach" has same arguments as "new"):
569 $msg->attach(Type =>'TEXT',
570 Data =>"Here's the GIF file you wanted"
572 $msg->attach(Type =>'image/gif',
573 Path =>'aaa000123.gif',
574 Filename =>'logo.gif'
577 $text = $msg->as_string;
579 C<MIME::Lite> also includes a method for sending these things.
583 This defaults to using L<sendmail> but can be customized to use
584 SMTP via L<Net::SMTP>.
586 =head2 How do I read mail?
588 While you could use the C<Mail::Folder> module from CPAN (part of the
589 C<MailFolder> package) or the C<Mail::Internet> module from CPAN (part
590 of the C<MailTools> package), often a module is overkill. Here's a
597 $/ = ''; # paragraph reads
600 /^Subject:\s*(?:Re:\s*)*(.*)/mi;
601 $sub[++$msgno] = lc($1) || '';
605 for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
612 # bysub2 - awkish sort-by-subject
613 BEGIN { $msgno = -1 }
614 $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
616 END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
618 =head2 How do I find out my hostname, domainname, or IP address?
619 X<hostname, domainname, IP address, host, domain, hostfqdn, inet_ntoa,
620 gethostbyname, Socket, Net::Domain, Sys::Hostname>
622 (contributed by brian d foy)
624 The C<Net::Domain> module, which is part of the standard distribution starting
625 in perl5.7.3, can get you the fully qualified domain name (FQDN), the host
626 name, or the domain name.
628 use Net::Domain qw(hostname hostfqdn hostdomain);
630 my $host = hostfqdn();
632 The C<Sys::Hostname> module, included in the standard distribution since
633 perl5.6, can also get the hostname.
639 To get the IP address, you can use the C<gethostbyname> built-in function
640 to turn the name into a number. To turn that number into the dotted octet
641 form (a.b.c.d) that most people expect, use the C<inet_ntoa> function
642 from the C<Socket> module, which also comes with perl.
646 my $address = inet_ntoa(
647 scalar gethostbyname( $host || 'localhost' )
650 =head2 How do I fetch a news article or the active newsgroups?
652 Use the C<Net::NNTP> or C<News::NNTPClient> modules, both available from CPAN.
653 This can make tasks like fetching the newsgroup list as simple as
655 perl -MNews::NNTPClient
656 -e 'print News::NNTPClient->new->list("newsgroups")'
658 =head2 How do I fetch/put an FTP file?
660 (contributed by brian d foy)
662 The C<LWP> family of modules (available on CPAN as the libwww-perl distribution)
663 can work with FTP just like it can with many other protocols. C<LWP::Simple>
664 makes it quite easy to fetch a file:
668 my $data = get( 'ftp://some.ftp.site/some/file.txt' );
670 If you want more direct or low-level control of the FTP process, you can use
671 the C<Net::FTP> module (in the Standard Library since Perl 5.8). It's
672 documentation has examples showing you just how to do that.
674 =head2 How can I do RPC in Perl?
676 (contributed by brian d foy)
678 Use one of the RPC modules you can find on CPAN (
679 http://search.cpan.org/search?query=RPC&mode=all ).
681 =head1 AUTHOR AND COPYRIGHT
683 Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
684 other authors as noted. All rights reserved.
686 This documentation is free; you can redistribute it and/or modify it
687 under the same terms as Perl itself.
689 Irrespective of its distribution, all code examples in this file
690 are hereby placed into the public domain. You are permitted and
691 encouraged to use this code in your own programs for fun
692 or for profit as you see fit. A simple comment in the code giving
693 credit would be courteous but is not required.