This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
New pod/perldelta.pod (previous one branched in last change):
[perl5.git] / pod / perlfaq9.pod
CommitLineData
68dc0745 1=head1 NAME
2
3e3baf6d 3perlfaq9 - Networking ($Revision: 1.17 $, $Date: 1997/04/24 22:44:29 $)
68dc0745 4
5=head1 DESCRIPTION
6
7This section deals with questions related to networking, the internet,
8and a few on the web.
9
10=head2 My CGI script runs from the command line but not the browser. Can you help me fix it?
11
12Sure, but you probably can't afford our contracting rates :-)
13
14Seriously, if you can demonstrate that you've read the following FAQs
15and that your problem isn't something simple that can be easily
16answered, you'll probably receive a courteous and useful reply to your
17question if you post it on comp.infosystems.www.authoring.cgi (if it's
18something to do with HTTP, HTML, or the CGI protocols). Questions that
19appear to be Perl questions but are really CGI ones that are posted to
20comp.lang.perl.misc may not be so well received.
21
22The useful FAQs are:
23
24 http://www.perl.com/perl/faq/idiots-guide.html
25 http://www3.pair.com/webthing/docs/cgi/faqs/cgifaq.shtml
26 http://www.perl.com/perl/faq/perl-cgi-faq.html
27 http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html
28 http://www.boutell.com/faq/
29
30=head2 How do I remove HTML from a string?
31
32The most correct way (albeit not the fastest) is to use HTML::Parse
33from CPAN (part of the libwww-perl distribution, which is a must-have
34module for all web hackers).
35
36Many folks attempt a simple-minded regular expression approach, like
37C<s/E<lt>.*?E<gt>//g>, but that fails in many cases because the tags
38may continue over line breaks, they may contain quoted angle-brackets,
39or HTML comment may be present. Plus folks forget to convert
40entities, like C<&lt;> for example.
41
42Here's one "simple-minded" approach, that works for most files:
43
44 #!/usr/bin/perl -p0777
45 s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
46
47If you want a more complete solution, see the 3-stage striphtml
48program in
49http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz
50.
51
52=head2 How do I extract URLs?
53
54310121 54A quick but imperfect approach is
68dc0745 55
56 #!/usr/bin/perl -n00
57 # qxurl - tchrist@perl.com
58 print "$2\n" while m{
59 < \s*
60 A \s+ HREF \s* = \s* (["']) (.*?) \1
61 \s* >
62 }gsix;
63
64This version does not adjust relative URLs, understand alternate
46fc3d4c 65bases, deal with HTML comments, deal with HREF and NAME attributes in
66the same tag, or accept URLs themselves as arguments. It also runs
67about 100x faster than a more "complete" solution using the LWP suite
68of modules, such as the
68dc0745 69http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz
70program.
71
72=head2 How do I download a file from the user's machine? How do I open a file on another machine?
73
74In the context of an HTML form, you can use what's known as
75B<multipart/form-data> encoding. The CGI.pm module (available from
76CPAN) supports this in the start_multipart_form() method, which isn't
77the same as the startform() method.
78
79=head2 How do I make a pop-up menu in HTML?
80
81Use the B<E<lt>SELECTE<gt>> and B<E<lt>OPTIONE<gt>> tags. The CGI.pm
82module (available from CPAN) supports this widget, as well as many
83others, including some that it cleverly synthesizes on its own.
84
85=head2 How do I fetch an HTML file?
86
46fc3d4c 87One approach, if you have the lynx text-based HTML browser installed
88on your system, is this:
68dc0745 89
90 $html_code = `lynx -source $url`;
91 $text_data = `lynx -dump $url`;
92
46fc3d4c 93The libwww-perl (LWP) modules from CPAN provide a more powerful way to
94do this. They work through proxies, and don't require lynx:
95
96 # print HTML from a URL
97 use LWP::Simple;
98 getprint "http://www.sn.no/libwww-perl/";
99
100 # print ASCII from HTML from a URL
101 use LWP::Simple;
102 use HTML::Parse;
103 use HTML::FormatText;
104 my ($html, $ascii);
105 $html = get("http://www.perl.com/");
106 defined $html
107 or die "Can't fetch HTML from http://www.perl.com/";
108 $ascii = HTML::FormatText->new->format(parse_html($html));
109 print $ascii;
110
68dc0745 111=head2 how do I decode or create those %-encodings on the web?
112
113Here's an example of decoding:
114
115 $string = "http://altavista.digital.com/cgi-bin/query?pg=q&what=news&fmt=.&q=%2Bcgi-bin+%2Bperl.exe";
116 $string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge;
117
118Encoding is a bit harder, because you can't just blindly change
119all the non-alphanumunder character (C<\W>) into their hex escapes.
120It's important that characters with special meaning like C</> and C<?>
121I<not> be translated. Probably the easiest way to get this right is
122to avoid reinventing the wheel and just use the URI::Escape module,
123which is part of the libwww-perl package (LWP) available from CPAN.
124
125=head2 How do I redirect to another page?
126
127Instead of sending back a C<Content-Type> as the headers of your
128reply, send back a C<Location:> header. Officially this should be a
129C<URI:> header, so the CGI.pm module (available from CPAN) sends back
130both:
131
132 Location: http://www.domain.com/newpage
133 URI: http://www.domain.com/newpage
134
135Note that relative URLs in these headers can cause strange effects
136because of "optimizations" that servers do.
137
138=head2 How do I put a password on my web pages?
139
140That depends. You'll need to read the documentation for your web
141server, or perhaps check some of the other FAQs referenced above.
142
143=head2 How do I edit my .htpasswd and .htgroup files with Perl?
144
145The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a
146consistent OO interface to these files, regardless of how they're
46fc3d4c 147stored. Databases may be text, dbm, Berkley DB or any database with a
68dc0745 148DBI compatible driver. HTTPD::UserAdmin supports files used by the
149`Basic' and `Digest' authentication schemes. Here's an example:
150
151 use HTTPD::UserAdmin ();
152 HTTPD::UserAdmin
153 ->new(DB => "/foo/.htpasswd")
154 ->add($username => $password);
155
46fc3d4c 156=head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things?
157
158Read the CGI security FAQ, at
159http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html, and the
160Perl/CGI FAQ at
161http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html.
162
163In brief: use tainting (see L<perlsec>), which makes sure that data
164from outside your script (eg, CGI parameters) are never used in
165C<eval> or C<system> calls. In addition to tainting, never use the
166single-argument form of system() or exec(). Instead, supply the
167command and arguments as a list, which prevents shell globbing.
168
68dc0745 169=head2 How do I parse an email header?
170
171For a quick-and-dirty solution, try this solution derived
172from page 222 of the 2nd edition of "Programming Perl":
173
174 $/ = '';
175 $header = <MSG>;
176 $header =~ s/\n\s+/ /g; # merge continuation lines
177 %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
178
179That solution doesn't do well if, for example, you're trying to
180maintain all the Received lines. A more complete approach is to use
181the Mail::Header module from CPAN (part of the MailTools package).
182
183=head2 How do I decode a CGI form?
184
185A lot of people are tempted to code this up themselves, so you've
186probably all seen a lot of code involving C<$ENV{CONTENT_LENGTH}> and
187C<$ENV{QUERY_STRING}>. It's true that this can work, but there are
188also a lot of versions of this floating around that are quite simply
189broken!
190
191Please do not be tempted to reinvent the wheel. Instead, use the
192CGI.pm or CGI_Lite.pm (available from CPAN), or if you're trapped in
193the module-free land of perl1 .. perl4, you might look into cgi-lib.pl
194(available from http://www.bio.cam.ac.uk/web/form.html).
195
196=head2 How do I check a valid email address?
197
198You can't.
199
200Without sending mail to the address and seeing whether it bounces (and
201even then you face the halting problem), you cannot determine whether
202an email address is valid. Even if you apply the email header
203standard, you can have problems, because there are deliverable
204addresses that aren't RFC-822 (the mail header standard) compliant,
205and addresses that aren't deliverable which are compliant.
206
207Many are tempted to try to eliminate many frequently-invalid email
208addresses with a simple regexp, such as
209C</^[\w.-]+\@([\w.-]\.)+\w+$/>. However, this also throws out many
210valid ones, and says nothing about potential deliverability, so is not
211suggested. Instead, see
212http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz ,
213which actually checks against the full RFC spec (except for nested
214comments), looks for addresses you may not wish to accept email to
215(say, Bill Clinton or your postmaster), and then makes sure that the
216hostname given can be looked up in DNS. It's not fast, but it works.
217
46fc3d4c 218Here's an alternative strategy used by many CGI script authors: Check
219the email address with a simple regexp (such as the one above). If
220the regexp matched the address, accept the address. If the regexp
221didn't match the address, request confirmation from the user that the
222email address they entered was correct.
223
68dc0745 224=head2 How do I decode a MIME/BASE64 string?
225
226The MIME-tools package (available from CPAN) handles this and a lot
227more. Decoding BASE64 becomes as simple as:
228
229 use MIME::base64;
230 $decoded = decode_base64($encoded);
231
232A more direct approach is to use the unpack() function's "u"
233format after minor transliterations:
234
235 tr#A-Za-z0-9+/##cd; # remove non-base64 chars
236 tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
237 $len = pack("c", 32 + 0.75*length); # compute length byte
238 print unpack("u", $len . $_); # uudecode and print
239
240=head2 How do I return the user's email address?
241
242On systems that support getpwuid, the $E<lt> variable and the
243Sys::Hostname module (which is part of the standard perl distribution),
244you can probably try using something like this:
245
246 use Sys::Hostname;
247 $address = sprintf('%s@%s', getpwuid($<), hostname);
248
249Company policies on email address can mean that this generates addresses
250that the company's email system will not accept, so you should ask for
251users' email addresses when this matters. Furthermore, not all systems
252on which Perl runs are so forthcoming with this information as is Unix.
253
254The Mail::Util module from CPAN (part of the MailTools package) provides a
255mailaddress() function that tries to guess the mail address of the user.
256It makes a more intelligent guess than the code above, using information
257given when the module was installed, but it could still be incorrect.
258Again, the best way is often just to ask the user.
259
260=head2 How do I send/read mail?
261
262Sending mail: the Mail::Mailer module from CPAN (part of the MailTools
46fc3d4c 263package) is UNIX-centric, while Mail::Internet uses Net::SMTP which is
264not UNIX-centric. Reading mail: use the Mail::Folder module from CPAN
68dc0745 265(part of the MailFolder package) or the Mail::Internet module from
266CPAN (also part of the MailTools package).
267
3fe9a6f1 268 # sending mail
269 use Mail::Internet;
270 use Mail::Header;
271 # say which mail host to use
272 $ENV{SMTPHOSTS} = 'mail.frii.com';
273 # create headers
274 $header = new Mail::Header;
275 $header->add('From', 'gnat@frii.com');
276 $header->add('Subject', 'Testing');
277 $header->add('To', 'gnat@frii.com');
278 # create body
279 $body = 'This is a test, ignore';
280 # create mail object
281 $mail = new Mail::Internet(undef, Header => $header, Body => \[$body]);
282 # send it
283 $mail->smtpsend or die;
284
68dc0745 285=head2 How do I find out my hostname/domainname/IP address?
286
287A lot of code has historically cavalierly called the C<`hostname`>
288program. While sometimes expedient, this isn't very portable. It's
289one of those tradeoffs of convenience versus portability.
290
291The Sys::Hostname module (part of the standard perl distribution) will
292give you the hostname after which you can find out the IP address
293(assuming you have working DNS) with a gethostbyname() call.
294
295 use Socket;
296 use Sys::Hostname;
297 my $host = hostname();
3e3baf6d 298 my $addr = inet_ntoa(scalar(gethostbyname($name)) || 'localhost');
68dc0745 299
300Probably the simplest way to learn your DNS domain name is to grok
301it out of /etc/resolv.conf, at least under Unix. Of course, this
302assumes several things about your resolv.conf configuration, including
303that it exists.
304
305(We still need a good DNS domain name-learning method for non-Unix
306systems.)
307
308=head2 How do I fetch a news article or the active newsgroups?
309
310Use the Net::NNTP or News::NNTPClient modules, both available from CPAN.
311This can make tasks like fetching the newsgroup list as simple as:
312
313 perl -MNews::NNTPClient
314 -e 'print News::NNTPClient->new->list("newsgroups")'
315
316=head2 How do I fetch/put an FTP file?
317
318LWP::Simple (available from CPAN) can fetch but not put. Net::FTP (also
319available from CPAN) is more complex but can put as well as fetch.
320
321=head2 How can I do RPC in Perl?
322
323A DCE::RPC module is being developed (but is not yet available), and
324will be released as part of the DCE-Perl package (available from
325CPAN). No ONC::RPC module is known.
326
327=head1 AUTHOR AND COPYRIGHT
328
329Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
330All rights reserved. See L<perlfaq> for distribution information.
46fc3d4c 331