Commit | Line | Data |
---|---|---|
68dc0745 | 1 | =head1 NAME |
2 | ||
3e3baf6d | 3 | perlfaq9 - Networking ($Revision: 1.17 $, $Date: 1997/04/24 22:44:29 $) |
68dc0745 | 4 | |
5 | =head1 DESCRIPTION | |
6 | ||
7 | This section deals with questions related to networking, the internet, | |
8 | and a few on the web. | |
9 | ||
10 | =head2 My CGI script runs from the command line but not the browser. Can you help me fix it? | |
11 | ||
12 | Sure, but you probably can't afford our contracting rates :-) | |
13 | ||
14 | Seriously, if you can demonstrate that you've read the following FAQs | |
15 | and that your problem isn't something simple that can be easily | |
16 | answered, you'll probably receive a courteous and useful reply to your | |
17 | question if you post it on comp.infosystems.www.authoring.cgi (if it's | |
18 | something to do with HTTP, HTML, or the CGI protocols). Questions that | |
19 | appear to be Perl questions but are really CGI ones that are posted to | |
20 | comp.lang.perl.misc may not be so well received. | |
21 | ||
22 | The useful FAQs are: | |
23 | ||
24 | http://www.perl.com/perl/faq/idiots-guide.html | |
25 | http://www3.pair.com/webthing/docs/cgi/faqs/cgifaq.shtml | |
26 | http://www.perl.com/perl/faq/perl-cgi-faq.html | |
27 | http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html | |
28 | http://www.boutell.com/faq/ | |
29 | ||
30 | =head2 How do I remove HTML from a string? | |
31 | ||
32 | The most correct way (albeit not the fastest) is to use HTML::Parse | |
33 | from CPAN (part of the libwww-perl distribution, which is a must-have | |
34 | module for all web hackers). | |
35 | ||
36 | Many folks attempt a simple-minded regular expression approach, like | |
37 | C<s/E<lt>.*?E<gt>//g>, but that fails in many cases because the tags | |
38 | may continue over line breaks, they may contain quoted angle-brackets, | |
39 | or HTML comment may be present. Plus folks forget to convert | |
40 | entities, like C<<> for example. | |
41 | ||
42 | Here's one "simple-minded" approach, that works for most files: | |
43 | ||
44 | #!/usr/bin/perl -p0777 | |
45 | s/<(?:[^>'"]*|(['"]).*?\1)*>//gs | |
46 | ||
47 | If you want a more complete solution, see the 3-stage striphtml | |
48 | program in | |
49 | http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz | |
50 | . | |
51 | ||
52 | =head2 How do I extract URLs? | |
53 | ||
54310121 | 54 | A quick but imperfect approach is |
68dc0745 | 55 | |
56 | #!/usr/bin/perl -n00 | |
57 | # qxurl - tchrist@perl.com | |
58 | print "$2\n" while m{ | |
59 | < \s* | |
60 | A \s+ HREF \s* = \s* (["']) (.*?) \1 | |
61 | \s* > | |
62 | }gsix; | |
63 | ||
64 | This version does not adjust relative URLs, understand alternate | |
46fc3d4c | 65 | bases, deal with HTML comments, deal with HREF and NAME attributes in |
66 | the same tag, or accept URLs themselves as arguments. It also runs | |
67 | about 100x faster than a more "complete" solution using the LWP suite | |
68 | of modules, such as the | |
68dc0745 | 69 | http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz |
70 | program. | |
71 | ||
72 | =head2 How do I download a file from the user's machine? How do I open a file on another machine? | |
73 | ||
74 | In the context of an HTML form, you can use what's known as | |
75 | B<multipart/form-data> encoding. The CGI.pm module (available from | |
76 | CPAN) supports this in the start_multipart_form() method, which isn't | |
77 | the same as the startform() method. | |
78 | ||
79 | =head2 How do I make a pop-up menu in HTML? | |
80 | ||
81 | Use the B<E<lt>SELECTE<gt>> and B<E<lt>OPTIONE<gt>> tags. The CGI.pm | |
82 | module (available from CPAN) supports this widget, as well as many | |
83 | others, including some that it cleverly synthesizes on its own. | |
84 | ||
85 | =head2 How do I fetch an HTML file? | |
86 | ||
46fc3d4c | 87 | One approach, if you have the lynx text-based HTML browser installed |
88 | on your system, is this: | |
68dc0745 | 89 | |
90 | $html_code = `lynx -source $url`; | |
91 | $text_data = `lynx -dump $url`; | |
92 | ||
46fc3d4c | 93 | The libwww-perl (LWP) modules from CPAN provide a more powerful way to |
94 | do this. They work through proxies, and don't require lynx: | |
95 | ||
96 | # print HTML from a URL | |
97 | use LWP::Simple; | |
98 | getprint "http://www.sn.no/libwww-perl/"; | |
99 | ||
100 | # print ASCII from HTML from a URL | |
101 | use LWP::Simple; | |
102 | use HTML::Parse; | |
103 | use HTML::FormatText; | |
104 | my ($html, $ascii); | |
105 | $html = get("http://www.perl.com/"); | |
106 | defined $html | |
107 | or die "Can't fetch HTML from http://www.perl.com/"; | |
108 | $ascii = HTML::FormatText->new->format(parse_html($html)); | |
109 | print $ascii; | |
110 | ||
68dc0745 | 111 | =head2 how do I decode or create those %-encodings on the web? |
112 | ||
113 | Here's an example of decoding: | |
114 | ||
115 | $string = "http://altavista.digital.com/cgi-bin/query?pg=q&what=news&fmt=.&q=%2Bcgi-bin+%2Bperl.exe"; | |
116 | $string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge; | |
117 | ||
118 | Encoding is a bit harder, because you can't just blindly change | |
119 | all the non-alphanumunder character (C<\W>) into their hex escapes. | |
120 | It's important that characters with special meaning like C</> and C<?> | |
121 | I<not> be translated. Probably the easiest way to get this right is | |
122 | to avoid reinventing the wheel and just use the URI::Escape module, | |
123 | which is part of the libwww-perl package (LWP) available from CPAN. | |
124 | ||
125 | =head2 How do I redirect to another page? | |
126 | ||
127 | Instead of sending back a C<Content-Type> as the headers of your | |
128 | reply, send back a C<Location:> header. Officially this should be a | |
129 | C<URI:> header, so the CGI.pm module (available from CPAN) sends back | |
130 | both: | |
131 | ||
132 | Location: http://www.domain.com/newpage | |
133 | URI: http://www.domain.com/newpage | |
134 | ||
135 | Note that relative URLs in these headers can cause strange effects | |
136 | because of "optimizations" that servers do. | |
137 | ||
138 | =head2 How do I put a password on my web pages? | |
139 | ||
140 | That depends. You'll need to read the documentation for your web | |
141 | server, or perhaps check some of the other FAQs referenced above. | |
142 | ||
143 | =head2 How do I edit my .htpasswd and .htgroup files with Perl? | |
144 | ||
145 | The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a | |
146 | consistent OO interface to these files, regardless of how they're | |
46fc3d4c | 147 | stored. Databases may be text, dbm, Berkley DB or any database with a |
68dc0745 | 148 | DBI compatible driver. HTTPD::UserAdmin supports files used by the |
149 | `Basic' and `Digest' authentication schemes. Here's an example: | |
150 | ||
151 | use HTTPD::UserAdmin (); | |
152 | HTTPD::UserAdmin | |
153 | ->new(DB => "/foo/.htpasswd") | |
154 | ->add($username => $password); | |
155 | ||
46fc3d4c | 156 | =head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things? |
157 | ||
158 | Read the CGI security FAQ, at | |
159 | http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html, and the | |
160 | Perl/CGI FAQ at | |
161 | http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html. | |
162 | ||
163 | In brief: use tainting (see L<perlsec>), which makes sure that data | |
164 | from outside your script (eg, CGI parameters) are never used in | |
165 | C<eval> or C<system> calls. In addition to tainting, never use the | |
166 | single-argument form of system() or exec(). Instead, supply the | |
167 | command and arguments as a list, which prevents shell globbing. | |
168 | ||
5a964f20 | 169 | =head2 How do I parse a mail header? |
68dc0745 | 170 | |
171 | For a quick-and-dirty solution, try this solution derived | |
172 | from page 222 of the 2nd edition of "Programming Perl": | |
173 | ||
174 | $/ = ''; | |
175 | $header = <MSG>; | |
176 | $header =~ s/\n\s+/ /g; # merge continuation lines | |
177 | %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header ); | |
178 | ||
179 | That solution doesn't do well if, for example, you're trying to | |
180 | maintain all the Received lines. A more complete approach is to use | |
181 | the Mail::Header module from CPAN (part of the MailTools package). | |
182 | ||
183 | =head2 How do I decode a CGI form? | |
184 | ||
185 | A lot of people are tempted to code this up themselves, so you've | |
186 | probably all seen a lot of code involving C<$ENV{CONTENT_LENGTH}> and | |
187 | C<$ENV{QUERY_STRING}>. It's true that this can work, but there are | |
188 | also a lot of versions of this floating around that are quite simply | |
189 | broken! | |
190 | ||
191 | Please do not be tempted to reinvent the wheel. Instead, use the | |
192 | CGI.pm or CGI_Lite.pm (available from CPAN), or if you're trapped in | |
193 | the module-free land of perl1 .. perl4, you might look into cgi-lib.pl | |
194 | (available from http://www.bio.cam.ac.uk/web/form.html). | |
195 | ||
5a964f20 | 196 | =head2 How do I check a valid mail address? |
68dc0745 | 197 | |
198 | You can't. | |
199 | ||
200 | Without sending mail to the address and seeing whether it bounces (and | |
201 | even then you face the halting problem), you cannot determine whether | |
5a964f20 | 202 | an mail address is valid. Even if you apply the mail header |
68dc0745 | 203 | standard, you can have problems, because there are deliverable |
204 | addresses that aren't RFC-822 (the mail header standard) compliant, | |
205 | and addresses that aren't deliverable which are compliant. | |
206 | ||
5a964f20 | 207 | Many are tempted to try to eliminate many frequently-invalid mail |
68dc0745 | 208 | addresses with a simple regexp, such as |
209 | C</^[\w.-]+\@([\w.-]\.)+\w+$/>. However, this also throws out many | |
210 | valid ones, and says nothing about potential deliverability, so is not | |
211 | suggested. Instead, see | |
212 | http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz , | |
213 | which actually checks against the full RFC spec (except for nested | |
5a964f20 | 214 | comments), looks for addresses you may not wish to accept mail to |
68dc0745 | 215 | (say, Bill Clinton or your postmaster), and then makes sure that the |
216 | hostname given can be looked up in DNS. It's not fast, but it works. | |
217 | ||
46fc3d4c | 218 | Here's an alternative strategy used by many CGI script authors: Check |
5a964f20 | 219 | the mail address with a simple regexp (such as the one above). If |
46fc3d4c | 220 | the regexp matched the address, accept the address. If the regexp |
221 | didn't match the address, request confirmation from the user that the | |
5a964f20 | 222 | mail address they entered was correct. |
46fc3d4c | 223 | |
68dc0745 | 224 | =head2 How do I decode a MIME/BASE64 string? |
225 | ||
226 | The MIME-tools package (available from CPAN) handles this and a lot | |
227 | more. Decoding BASE64 becomes as simple as: | |
228 | ||
229 | use MIME::base64; | |
230 | $decoded = decode_base64($encoded); | |
231 | ||
232 | A more direct approach is to use the unpack() function's "u" | |
233 | format after minor transliterations: | |
234 | ||
235 | tr#A-Za-z0-9+/##cd; # remove non-base64 chars | |
236 | tr#A-Za-z0-9+/# -_#; # convert to uuencoded format | |
237 | $len = pack("c", 32 + 0.75*length); # compute length byte | |
238 | print unpack("u", $len . $_); # uudecode and print | |
239 | ||
5a964f20 | 240 | =head2 How do I return the user's mail address? |
68dc0745 | 241 | |
242 | On systems that support getpwuid, the $E<lt> variable and the | |
243 | Sys::Hostname module (which is part of the standard perl distribution), | |
244 | you can probably try using something like this: | |
245 | ||
246 | use Sys::Hostname; | |
247 | $address = sprintf('%s@%s', getpwuid($<), hostname); | |
248 | ||
5a964f20 TC |
249 | Company policies on mail address can mean that this generates addresses |
250 | that the company's mail system will not accept, so you should ask for | |
251 | users' mail addresses when this matters. Furthermore, not all systems | |
68dc0745 | 252 | on which Perl runs are so forthcoming with this information as is Unix. |
253 | ||
254 | The Mail::Util module from CPAN (part of the MailTools package) provides a | |
255 | mailaddress() function that tries to guess the mail address of the user. | |
256 | It makes a more intelligent guess than the code above, using information | |
257 | given when the module was installed, but it could still be incorrect. | |
258 | Again, the best way is often just to ask the user. | |
259 | ||
260 | =head2 How do I send/read mail? | |
261 | ||
262 | Sending mail: the Mail::Mailer module from CPAN (part of the MailTools | |
46fc3d4c | 263 | package) is UNIX-centric, while Mail::Internet uses Net::SMTP which is |
264 | not UNIX-centric. Reading mail: use the Mail::Folder module from CPAN | |
68dc0745 | 265 | (part of the MailFolder package) or the Mail::Internet module from |
266 | CPAN (also part of the MailTools package). | |
267 | ||
3fe9a6f1 | 268 | # sending mail |
269 | use Mail::Internet; | |
270 | use Mail::Header; | |
271 | # say which mail host to use | |
272 | $ENV{SMTPHOSTS} = 'mail.frii.com'; | |
273 | # create headers | |
274 | $header = new Mail::Header; | |
275 | $header->add('From', 'gnat@frii.com'); | |
276 | $header->add('Subject', 'Testing'); | |
277 | $header->add('To', 'gnat@frii.com'); | |
278 | # create body | |
279 | $body = 'This is a test, ignore'; | |
280 | # create mail object | |
281 | $mail = new Mail::Internet(undef, Header => $header, Body => \[$body]); | |
282 | # send it | |
283 | $mail->smtpsend or die; | |
284 | ||
68dc0745 | 285 | =head2 How do I find out my hostname/domainname/IP address? |
286 | ||
287 | A lot of code has historically cavalierly called the C<`hostname`> | |
288 | program. While sometimes expedient, this isn't very portable. It's | |
289 | one of those tradeoffs of convenience versus portability. | |
290 | ||
291 | The Sys::Hostname module (part of the standard perl distribution) will | |
292 | give you the hostname after which you can find out the IP address | |
293 | (assuming you have working DNS) with a gethostbyname() call. | |
294 | ||
295 | use Socket; | |
296 | use Sys::Hostname; | |
297 | my $host = hostname(); | |
3e3baf6d | 298 | my $addr = inet_ntoa(scalar(gethostbyname($name)) || 'localhost'); |
68dc0745 | 299 | |
300 | Probably the simplest way to learn your DNS domain name is to grok | |
301 | it out of /etc/resolv.conf, at least under Unix. Of course, this | |
302 | assumes several things about your resolv.conf configuration, including | |
303 | that it exists. | |
304 | ||
305 | (We still need a good DNS domain name-learning method for non-Unix | |
306 | systems.) | |
307 | ||
308 | =head2 How do I fetch a news article or the active newsgroups? | |
309 | ||
310 | Use the Net::NNTP or News::NNTPClient modules, both available from CPAN. | |
311 | This can make tasks like fetching the newsgroup list as simple as: | |
312 | ||
313 | perl -MNews::NNTPClient | |
314 | -e 'print News::NNTPClient->new->list("newsgroups")' | |
315 | ||
316 | =head2 How do I fetch/put an FTP file? | |
317 | ||
318 | LWP::Simple (available from CPAN) can fetch but not put. Net::FTP (also | |
319 | available from CPAN) is more complex but can put as well as fetch. | |
320 | ||
321 | =head2 How can I do RPC in Perl? | |
322 | ||
323 | A DCE::RPC module is being developed (but is not yet available), and | |
324 | will be released as part of the DCE-Perl package (available from | |
325 | CPAN). No ONC::RPC module is known. | |
326 | ||
327 | =head1 AUTHOR AND COPYRIGHT | |
328 | ||
5a964f20 TC |
329 | Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. |
330 | All rights reserved. | |
331 | ||
332 | When included as part of the Standard Version of Perl, or as part of | |
333 | its complete documentation whether printed or otherwise, this work | |
334 | may be distributed only under the terms of Perl's Artistic License. | |
335 | Any distribution of this file or derivatives thereof I<outside> | |
336 | of that package require that special arrangements be made with | |
337 | copyright holder. | |
338 | ||
339 | Irrespective of its distribution, all code examples in this file | |
340 | are hereby placed into the public domain. You are permitted and | |
341 | encouraged to use this code in your own programs for fun | |
342 | or for profit as you see fit. A simple comment in the code giving | |
343 | credit would be courteous but is not required. |