This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Integrate mainline. (Builds but does not work - something broke pipes...)
[perl5.git] / pod / perlfaq9.pod
CommitLineData
68dc0745 1=head1 NAME
2
d92eb7b0 3perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $)
68dc0745 4
5=head1 DESCRIPTION
6
7This section deals with questions related to networking, the internet,
8and a few on the web.
9
a6dd486b 10=head2 My CGI script runs from the command line but not the browser. (500 Server Error)
68dc0745 11
c8db1d39
TC
12If you can demonstrate that you've read the following FAQs and that
13your problem isn't something simple that can be easily answered, you'll
14probably receive a courteous and useful reply to your question if you
15post it on comp.infosystems.www.authoring.cgi (if it's something to do
16with HTTP, HTML, or the CGI protocols). Questions that appear to be Perl
17questions but are really CGI ones that are posted to comp.lang.perl.misc
18may not be so well received.
68dc0745 19
c8db1d39 20The useful FAQs and related documents are:
68dc0745 21
c8db1d39 22 CGI FAQ
6cecdcac 23 http://www.webthing.com/tutorials/cgifaq.html
68dc0745 24
c8db1d39 25 Web FAQ
92c2ed05 26 http://www.boutell.com/faq/
68dc0745 27
c8db1d39
TC
28 WWW Security FAQ
29 http://www.w3.org/Security/Faq/
30
31 HTTP Spec
32 http://www.w3.org/pub/WWW/Protocols/HTTP/
33
34 HTML Spec
35 http://www.w3.org/TR/REC-html40/
36 http://www.w3.org/pub/WWW/MarkUp/
37
38 CGI Spec
39 http://www.w3.org/CGI/
40
41 CGI Security FAQ
42 http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
43
44=head2 How can I get better error messages from a CGI program?
45
46Use the CGI::Carp module. It replaces C<warn> and C<die>, plus the
47normal Carp modules C<carp>, C<croak>, and C<confess> functions with
48more verbose and safer versions. It still sends them to the normal
49server error log.
50
51 use CGI::Carp;
52 warn "This is a complaint";
53 die "But this one is serious";
54
55The following use of CGI::Carp also redirects errors to a file of your choice,
56placed in a BEGIN block to catch compile-time warnings as well:
57
58 BEGIN {
59 use CGI::Carp qw(carpout);
60 open(LOG, ">>/var/local/cgi-logs/mycgi-log")
61 or die "Unable to append to mycgi-log: $!\n";
62 carpout(*LOG);
63 }
64
65You can even arrange for fatal errors to go back to the client browser,
66which is nice for your own debugging, but might confuse the end user.
67
68 use CGI::Carp qw(fatalsToBrowser);
69 die "Bad error here";
70
71Even if the error happens before you get the HTTP header out, the module
72will try to take care of this to avoid the dreaded server 500 errors.
73Normal warnings still go out to the server error log (or wherever
74you've sent them with C<carpout>) with the application name and date
75stamp prepended.
76
68dc0745 77=head2 How do I remove HTML from a string?
78
f29c64d6 79The most correct way (albeit not the fastest) is to use HTML::Parser
bed171df 80from CPAN. Another mostly correct
7d7e76cf
MS
81way is to use HTML::FormatText which not only removes HTML but also
82attempts to do a little simple formatting of the resulting plain text.
68dc0745 83
84Many folks attempt a simple-minded regular expression approach, like
c47ff5f1 85C<< s/<.*?>//g >>, but that fails in many cases because the tags
68dc0745 86may continue over line breaks, they may contain quoted angle-brackets,
a6dd486b
JB
87or HTML comment may be present. Plus, folks forget to convert
88entities--like C<&lt;> for example.
68dc0745 89
90Here's one "simple-minded" approach, that works for most files:
91
92 #!/usr/bin/perl -p0777
93 s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
94
95If you want a more complete solution, see the 3-stage striphtml
96program in
97http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz
98.
99
c8db1d39
TC
100Here are some tricky cases that you should think about when picking
101a solution:
102
103 <IMG SRC = "foo.gif" ALT = "A > B">
104
d92eb7b0 105 <IMG SRC = "foo.gif"
c8db1d39
TC
106 ALT = "A > B">
107
108 <!-- <A comment> -->
109
110 <script>if (a<b && a>c)</script>
111
112 <# Just data #>
113
114 <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
115
116If HTML comments include other tags, those solutions would also break
117on text like this:
118
119 <!-- This section commented out.
120 <B>You can't see me!</B>
121 -->
122
68dc0745 123=head2 How do I extract URLs?
124
54310121 125A quick but imperfect approach is
68dc0745 126
127 #!/usr/bin/perl -n00
128 # qxurl - tchrist@perl.com
129 print "$2\n" while m{
130 < \s*
131 A \s+ HREF \s* = \s* (["']) (.*?) \1
132 \s* >
133 }gsix;
134
135This version does not adjust relative URLs, understand alternate
d92eb7b0
GS
136bases, deal with HTML comments, deal with HREF and NAME attributes
137in the same tag, understand extra qualifiers like TARGET, or accept
138URLs themselves as arguments. It also runs about 100x faster than a
139more "complete" solution using the LWP suite of modules, such as the
140http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.
68dc0745 141
142=head2 How do I download a file from the user's machine? How do I open a file on another machine?
143
144In the context of an HTML form, you can use what's known as
145B<multipart/form-data> encoding. The CGI.pm module (available from
146CPAN) supports this in the start_multipart_form() method, which isn't
147the same as the startform() method.
148
149=head2 How do I make a pop-up menu in HTML?
150
c47ff5f1 151Use the B<< <SELECT> >> and B<< <OPTION> >> tags. The CGI.pm
68dc0745 152module (available from CPAN) supports this widget, as well as many
153others, including some that it cleverly synthesizes on its own.
154
155=head2 How do I fetch an HTML file?
156
46fc3d4c 157One approach, if you have the lynx text-based HTML browser installed
158on your system, is this:
68dc0745 159
160 $html_code = `lynx -source $url`;
161 $text_data = `lynx -dump $url`;
162
d92eb7b0
GS
163The libwww-perl (LWP) modules from CPAN provide a more powerful way
164to do this. They don't require lynx, but like lynx, can still work
165through proxies:
46fc3d4c 166
c8db1d39
TC
167 # simplest version
168 use LWP::Simple;
169 $content = get($URL);
170
171 # or print HTML from a URL
46fc3d4c 172 use LWP::Simple;
6cecdcac 173 getprint "http://www.linpro.no/lwp/";
46fc3d4c 174
c8db1d39 175 # or print ASCII from HTML from a URL
65acb1b1 176 # also need HTML-Tree package from CPAN
46fc3d4c 177 use LWP::Simple;
f29c64d6 178 use HTML::Parser;
46fc3d4c 179 use HTML::FormatText;
180 my ($html, $ascii);
181 $html = get("http://www.perl.com/");
182 defined $html
183 or die "Can't fetch HTML from http://www.perl.com/";
184 $ascii = HTML::FormatText->new->format(parse_html($html));
185 print $ascii;
186
c8db1d39
TC
187=head2 How do I automate an HTML form submission?
188
189If you're submitting values using the GET method, create a URL and encode
190the form using the C<query_form> method:
191
192 use LWP::Simple;
193 use URI::URL;
194
195 my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
196 $url->query_form(module => 'DB_File', readme => 1);
197 $content = get($url);
198
199If you're using the POST method, create your own user agent and encode
200the content appropriately.
201
202 use HTTP::Request::Common qw(POST);
203 use LWP::UserAgent;
204
205 $ua = LWP::UserAgent->new();
206 my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
207 [ module => 'DB_File', readme => 1 ];
208 $content = $ua->request($req)->as_string;
209
210=head2 How do I decode or create those %-encodings on the web?
68dc0745 211
212Here's an example of decoding:
213
214 $string = "http://altavista.digital.com/cgi-bin/query?pg=q&what=news&fmt=.&q=%2Bcgi-bin+%2Bperl.exe";
215 $string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge;
216
217Encoding is a bit harder, because you can't just blindly change
f1cbbd6e
GS
218all characters that are not letters, digits or underscores (C<\W>)
219into their hex escapes.
68dc0745 220It's important that characters with special meaning like C</> and C<?>
221I<not> be translated. Probably the easiest way to get this right is
222to avoid reinventing the wheel and just use the URI::Escape module,
bed171df 223available from CPAN.
68dc0745 224
225=head2 How do I redirect to another page?
226
fd5506a0
SP
227According to RFC 2616, "Hypertext Transfer Protocol -- HTTP/1.1", the
228preferred method is to send a C<Location:> header instead of a
229C<Content-Type:> header:
68dc0745 230
231 Location: http://www.domain.com/newpage
68dc0745 232
233Note that relative URLs in these headers can cause strange effects
234because of "optimizations" that servers do.
235
c8db1d39
TC
236 $url = "http://www.perl.com/CPAN/";
237 print "Location: $url\n\n";
238 exit;
239
d92eb7b0
GS
240To target a particular frame in a frameset, include the "Window-target:"
241in the header.
242
243 print <<EOF;
244 Location: http://www.domain.com/newpage
245 Window-target: <FrameName>
246
247 EOF
248
a6dd486b
JB
249To be correct to the spec, each of those virtual newlines should
250really be physical C<"\015\012"> sequences by the time your message is
251received by the client browser. Except for NPH scripts, though, that
252local newline should get translated by your server into standard form,
253so you shouldn't have a problem here, even if you are stuck on MacOS.
254Everybody else probably won't even notice.
c8db1d39 255
68dc0745 256=head2 How do I put a password on my web pages?
257
258That depends. You'll need to read the documentation for your web
259server, or perhaps check some of the other FAQs referenced above.
260
261=head2 How do I edit my .htpasswd and .htgroup files with Perl?
262
263The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a
264consistent OO interface to these files, regardless of how they're
46fc3d4c 265stored. Databases may be text, dbm, Berkley DB or any database with a
68dc0745 266DBI compatible driver. HTTPD::UserAdmin supports files used by the
267`Basic' and `Digest' authentication schemes. Here's an example:
268
269 use HTTPD::UserAdmin ();
270 HTTPD::UserAdmin
271 ->new(DB => "/foo/.htpasswd")
272 ->add($username => $password);
273
46fc3d4c 274=head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things?
275
276Read the CGI security FAQ, at
a6dd486b 277http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html , and the
46fc3d4c 278Perl/CGI FAQ at
a6dd486b 279http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html .
46fc3d4c 280
281In brief: use tainting (see L<perlsec>), which makes sure that data
282from outside your script (eg, CGI parameters) are never used in
283C<eval> or C<system> calls. In addition to tainting, never use the
284single-argument form of system() or exec(). Instead, supply the
285command and arguments as a list, which prevents shell globbing.
286
5a964f20 287=head2 How do I parse a mail header?
68dc0745 288
289For a quick-and-dirty solution, try this solution derived
b73a15ae 290from L<perlfunc/split>:
68dc0745 291
292 $/ = '';
293 $header = <MSG>;
294 $header =~ s/\n\s+/ /g; # merge continuation lines
295 %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
296
297That solution doesn't do well if, for example, you're trying to
298maintain all the Received lines. A more complete approach is to use
299the Mail::Header module from CPAN (part of the MailTools package).
300
301=head2 How do I decode a CGI form?
302
c8db1d39
TC
303You use a standard module, probably CGI.pm. Under no circumstances
304should you attempt to do so by hand!
305
306You'll see a lot of CGI programs that blindly read from STDIN the number
307of bytes equal to CONTENT_LENGTH for POSTs, or grab QUERY_STRING for
308decoding GETs. These programs are very poorly written. They only work
309sometimes. They typically forget to check the return value of the read()
310system call, which is a cardinal sin. They don't handle HEAD requests.
311They don't handle multipart forms used for file uploads. They don't deal
312with GET/POST combinations where query fields are in more than one place.
313They don't deal with keywords in the query string.
314
315In short, they're bad hacks. Resist them at all costs. Please do not be
316tempted to reinvent the wheel. Instead, use the CGI.pm or CGI_Lite.pm
317(available from CPAN), or if you're trapped in the module-free land
318of perl1 .. perl4, you might look into cgi-lib.pl (available from
65acb1b1 319http://cgi-lib.stanford.edu/cgi-lib/ ).
c8db1d39
TC
320
321Make sure you know whether to use a GET or a POST in your form.
322GETs should only be used for something that doesn't update the server.
323Otherwise you can get mangled databases and repeated feedback mail
324messages. The fancy word for this is ``idempotency''. This simply
325means that there should be no difference between making a GET request
326for a particular URL once or multiple times. This is because the
327HTTP protocol definition says that a GET request may be cached by the
328browser, or server, or an intervening proxy. POST requests cannot be
329cached, because each request is independent and matters. Typically,
330POST requests change or depend on state on the server (query or update
331a database, send mail, or purchase a computer).
68dc0745 332
5a964f20 333=head2 How do I check a valid mail address?
68dc0745 334
c8db1d39 335You can't, at least, not in real time. Bummer, eh?
68dc0745 336
c8db1d39
TC
337Without sending mail to the address and seeing whether there's a human
338on the other hand to answer you, you cannot determine whether a mail
339address is valid. Even if you apply the mail header standard, you
340can have problems, because there are deliverable addresses that aren't
341RFC-822 (the mail header standard) compliant, and addresses that aren't
342deliverable which are compliant.
68dc0745 343
c8db1d39 344Many are tempted to try to eliminate many frequently-invalid
d92eb7b0 345mail addresses with a simple regex, such as
c8db1d39
TC
346C</^[\w.-]+\@([\w.-]\.)+\w+$/>. It's a very bad idea. However,
347this also throws out many valid ones, and says nothing about
348potential deliverability, so is not suggested. Instead, see
68dc0745 349http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz ,
350which actually checks against the full RFC spec (except for nested
5a964f20 351comments), looks for addresses you may not wish to accept mail to
68dc0745 352(say, Bill Clinton or your postmaster), and then makes sure that the
c8db1d39
TC
353hostname given can be looked up in the DNS MX records. It's not fast,
354but it works for what it tries to do.
355
356Our best advice for verifying a person's mail address is to have them
357enter their address twice, just as you normally do to change a password.
358This usually weeds out typos. If both versions match, send
359mail to that address with a personal message that looks somewhat like:
360
361 Dear someuser@host.com,
362
363 Please confirm the mail address you gave us Wed May 6 09:38:41
364 MDT 1998 by replying to this message. Include the string
365 "Rumpelstiltskin" in that reply, but spelled in reverse; that is,
366 start with "Nik...". Once this is done, your confirmed address will
367 be entered into our records.
368
369If you get the message back and they've followed your directions,
370you can be reasonably assured that it's real.
68dc0745 371
c8db1d39
TC
372A related strategy that's less open to forgery is to give them a PIN
373(personal ID number). Record the address and PIN (best that it be a
374random one) for later processing. In the mail you send, ask them to
375include the PIN in their reply. But if it bounces, or the message is
376included via a ``vacation'' script, it'll be there anyway. So it's
377best to ask them to mail back a slight alteration of the PIN, such as
378with the characters reversed, one added or subtracted to each digit, etc.
46fc3d4c 379
68dc0745 380=head2 How do I decode a MIME/BASE64 string?
381
382The MIME-tools package (available from CPAN) handles this and a lot
383more. Decoding BASE64 becomes as simple as:
384
385 use MIME::base64;
386 $decoded = decode_base64($encoded);
387
388A more direct approach is to use the unpack() function's "u"
389format after minor transliterations:
390
391 tr#A-Za-z0-9+/##cd; # remove non-base64 chars
392 tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
393 $len = pack("c", 32 + 0.75*length); # compute length byte
394 print unpack("u", $len . $_); # uudecode and print
395
5a964f20 396=head2 How do I return the user's mail address?
68dc0745 397
a6dd486b 398On systems that support getpwuid, the $< variable, and the
68dc0745 399Sys::Hostname module (which is part of the standard perl distribution),
400you can probably try using something like this:
401
402 use Sys::Hostname;
231ab6d1 403 $address = sprintf('%s@%s', scalar getpwuid($<), hostname);
68dc0745 404
5a964f20
TC
405Company policies on mail address can mean that this generates addresses
406that the company's mail system will not accept, so you should ask for
407users' mail addresses when this matters. Furthermore, not all systems
68dc0745 408on which Perl runs are so forthcoming with this information as is Unix.
409
410The Mail::Util module from CPAN (part of the MailTools package) provides a
411mailaddress() function that tries to guess the mail address of the user.
412It makes a more intelligent guess than the code above, using information
413given when the module was installed, but it could still be incorrect.
414Again, the best way is often just to ask the user.
415
c8db1d39 416=head2 How do I send mail?
68dc0745 417
c8db1d39
TC
418Use the C<sendmail> program directly:
419
420 open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
421 or die "Can't fork for sendmail: $!\n";
422 print SENDMAIL <<"EOF";
423 From: User Originating Mail <me\@host>
424 To: Final Destination <you\@otherhost>
425 Subject: A relevant subject line
426
65acb1b1
TC
427 Body of the message goes here after the blank line
428 in as many lines as you like.
c8db1d39
TC
429 EOF
430 close(SENDMAIL) or warn "sendmail didn't close nicely";
431
432The B<-oi> option prevents sendmail from interpreting a line consisting
433of a single dot as "end of message". The B<-t> option says to use the
434headers to decide who to send the message to, and B<-odq> says to put
435the message into the queue. This last option means your message won't
436be immediately delivered, so leave it out if you want immediate
437delivery.
438
d92eb7b0
GS
439Alternate, less convenient approaches include calling mail (sometimes
440called mailx) directly or simply opening up port 25 have having an
441intimate conversation between just you and the remote SMTP daemon,
442probably sendmail.
443
444Or you might be able use the CPAN module Mail::Mailer:
c8db1d39
TC
445
446 use Mail::Mailer;
447
448 $mailer = Mail::Mailer->new();
449 $mailer->open({ From => $from_address,
450 To => $to_address,
451 Subject => $subject,
452 })
453 or die "Can't open: $!\n";
454 print $mailer $body;
455 $mailer->close();
456
457The Mail::Internet module uses Net::SMTP which is less Unix-centric than
458Mail::Mailer, but less reliable. Avoid raw SMTP commands. There
d92eb7b0 459are many reasons to use a mail transport agent like sendmail. These
c8db1d39
TC
460include queueing, MX records, and security.
461
462=head2 How do I read mail?
463
d92eb7b0
GS
464While you could use the Mail::Folder module from CPAN (part of the
465MailFolder package) or the Mail::Internet module from CPAN (also part
a6dd486b 466of the MailTools package), often a module is overkill. Here's a
d92eb7b0
GS
467mail sorter.
468
469 #!/usr/bin/perl
c8db1d39
TC
470 # bysub1 - simple sort by subject
471 my(@msgs, @sub);
472 my $msgno = -1;
473 $/ = ''; # paragraph reads
474 while (<>) {
475 if (/^From/m) {
476 /^Subject:\s*(?:Re:\s*)*(.*)/mi;
477 $sub[++$msgno] = lc($1) || '';
478 }
479 $msgs[$msgno] .= $_;
d92eb7b0 480 }
c8db1d39
TC
481 for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
482 print $msgs[$i];
483 }
484
d92eb7b0 485Or more succinctly,
c8db1d39
TC
486
487 #!/usr/bin/perl -n00
488 # bysub2 - awkish sort-by-subject
489 BEGIN { $msgno = -1 }
490 $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
491 $msg[$msgno] .= $_;
492 END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
493
68dc0745 494=head2 How do I find out my hostname/domainname/IP address?
495
c8db1d39
TC
496The normal way to find your own hostname is to call the C<`hostname`>
497program. While sometimes expedient, this has some problems, such as
498not knowing whether you've got the canonical name or not. It's one of
499those tradeoffs of convenience versus portability.
68dc0745 500
501The Sys::Hostname module (part of the standard perl distribution) will
502give you the hostname after which you can find out the IP address
503(assuming you have working DNS) with a gethostbyname() call.
504
505 use Socket;
506 use Sys::Hostname;
507 my $host = hostname();
65acb1b1 508 my $addr = inet_ntoa(scalar gethostbyname($host || 'localhost'));
68dc0745 509
510Probably the simplest way to learn your DNS domain name is to grok
511it out of /etc/resolv.conf, at least under Unix. Of course, this
512assumes several things about your resolv.conf configuration, including
513that it exists.
514
515(We still need a good DNS domain name-learning method for non-Unix
516systems.)
517
518=head2 How do I fetch a news article or the active newsgroups?
519
520Use the Net::NNTP or News::NNTPClient modules, both available from CPAN.
a6dd486b 521This can make tasks like fetching the newsgroup list as simple as
68dc0745 522
523 perl -MNews::NNTPClient
524 -e 'print News::NNTPClient->new->list("newsgroups")'
525
526=head2 How do I fetch/put an FTP file?
527
528LWP::Simple (available from CPAN) can fetch but not put. Net::FTP (also
529available from CPAN) is more complex but can put as well as fetch.
530
531=head2 How can I do RPC in Perl?
532
a6dd486b 533A DCE::RPC module is being developed (but is not yet available) and
68dc0745 534will be released as part of the DCE-Perl package (available from
65acb1b1
TC
535CPAN). The rpcgen suite, available from CPAN/authors/id/JAKE/, is
536an RPC stub generator and includes an RPC::ONC module.
68dc0745 537
538=head1 AUTHOR AND COPYRIGHT
539
65acb1b1 540Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
5a964f20
TC
541All rights reserved.
542
543When included as part of the Standard Version of Perl, or as part of
544its complete documentation whether printed or otherwise, this work
d92eb7b0 545may be distributed only under the terms of Perl's Artistic License.
5a964f20
TC
546Any distribution of this file or derivatives thereof I<outside>
547of that package require that special arrangements be made with
548copyright holder.
549
550Irrespective of its distribution, all code examples in this file
551are hereby placed into the public domain. You are permitted and
552encouraged to use this code in your own programs for fun
553or for profit as you see fit. A simple comment in the code giving
554credit would be courteous but is not required.