68dc0745 |
1 | =head1 NAME |
2 | |
3 | perlfaq9 - Networking ($Revision: 1.13 $) |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | This section deals with questions related to networking, the internet, |
8 | and a few on the web. |
9 | |
10 | =head2 My CGI script runs from the command line but not the browser. Can you help me fix it? |
11 | |
12 | Sure, but you probably can't afford our contracting rates :-) |
13 | |
14 | Seriously, if you can demonstrate that you've read the following FAQs |
15 | and that your problem isn't something simple that can be easily |
16 | answered, you'll probably receive a courteous and useful reply to your |
17 | question if you post it on comp.infosystems.www.authoring.cgi (if it's |
18 | something to do with HTTP, HTML, or the CGI protocols). Questions that |
19 | appear to be Perl questions but are really CGI ones that are posted to |
20 | comp.lang.perl.misc may not be so well received. |
21 | |
22 | The useful FAQs are: |
23 | |
24 | http://www.perl.com/perl/faq/idiots-guide.html |
25 | http://www3.pair.com/webthing/docs/cgi/faqs/cgifaq.shtml |
26 | http://www.perl.com/perl/faq/perl-cgi-faq.html |
27 | http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html |
28 | http://www.boutell.com/faq/ |
29 | |
30 | =head2 How do I remove HTML from a string? |
31 | |
32 | The most correct way (albeit not the fastest) is to use HTML::Parse |
33 | from CPAN (part of the libwww-perl distribution, which is a must-have |
34 | module for all web hackers). |
35 | |
36 | Many folks attempt a simple-minded regular expression approach, like |
37 | C<s/E<lt>.*?E<gt>//g>, but that fails in many cases because the tags |
38 | may continue over line breaks, they may contain quoted angle-brackets, |
39 | or HTML comment may be present. Plus folks forget to convert |
40 | entities, like C<<> for example. |
41 | |
42 | Here's one "simple-minded" approach, that works for most files: |
43 | |
44 | #!/usr/bin/perl -p0777 |
45 | s/<(?:[^>'"]*|(['"]).*?\1)*>//gs |
46 | |
47 | If you want a more complete solution, see the 3-stage striphtml |
48 | program in |
49 | http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz |
50 | . |
51 | |
52 | =head2 How do I extract URLs? |
53 | |
54 | A quick but imperfect approach is |
55 | |
56 | #!/usr/bin/perl -n00 |
57 | # qxurl - tchrist@perl.com |
58 | print "$2\n" while m{ |
59 | < \s* |
60 | A \s+ HREF \s* = \s* (["']) (.*?) \1 |
61 | \s* > |
62 | }gsix; |
63 | |
64 | This version does not adjust relative URLs, understand alternate |
65 | bases, deal with HTML comments, or accept URLs themselves as |
66 | arguments. It also runs about 100x faster than a more "complete" |
67 | solution using the LWP suite of modules, such as the |
68 | http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz |
69 | program. |
70 | |
71 | =head2 How do I download a file from the user's machine? How do I open a file on another machine? |
72 | |
73 | In the context of an HTML form, you can use what's known as |
74 | B<multipart/form-data> encoding. The CGI.pm module (available from |
75 | CPAN) supports this in the start_multipart_form() method, which isn't |
76 | the same as the startform() method. |
77 | |
78 | =head2 How do I make a pop-up menu in HTML? |
79 | |
80 | Use the B<E<lt>SELECTE<gt>> and B<E<lt>OPTIONE<gt>> tags. The CGI.pm |
81 | module (available from CPAN) supports this widget, as well as many |
82 | others, including some that it cleverly synthesizes on its own. |
83 | |
84 | =head2 How do I fetch an HTML file? |
85 | |
86 | Use the LWP::Simple module available from CPAN, part of the excellent |
87 | libwww-perl (LWP) package. On the other hand, and if you have the |
88 | lynx text-based HTML browser installed on your system, this isn't too |
89 | bad: |
90 | |
91 | $html_code = `lynx -source $url`; |
92 | $text_data = `lynx -dump $url`; |
93 | |
94 | =head2 how do I decode or create those %-encodings on the web? |
95 | |
96 | Here's an example of decoding: |
97 | |
98 | $string = "http://altavista.digital.com/cgi-bin/query?pg=q&what=news&fmt=.&q=%2Bcgi-bin+%2Bperl.exe"; |
99 | $string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge; |
100 | |
101 | Encoding is a bit harder, because you can't just blindly change |
102 | all the non-alphanumunder character (C<\W>) into their hex escapes. |
103 | It's important that characters with special meaning like C</> and C<?> |
104 | I<not> be translated. Probably the easiest way to get this right is |
105 | to avoid reinventing the wheel and just use the URI::Escape module, |
106 | which is part of the libwww-perl package (LWP) available from CPAN. |
107 | |
108 | =head2 How do I redirect to another page? |
109 | |
110 | Instead of sending back a C<Content-Type> as the headers of your |
111 | reply, send back a C<Location:> header. Officially this should be a |
112 | C<URI:> header, so the CGI.pm module (available from CPAN) sends back |
113 | both: |
114 | |
115 | Location: http://www.domain.com/newpage |
116 | URI: http://www.domain.com/newpage |
117 | |
118 | Note that relative URLs in these headers can cause strange effects |
119 | because of "optimizations" that servers do. |
120 | |
121 | =head2 How do I put a password on my web pages? |
122 | |
123 | That depends. You'll need to read the documentation for your web |
124 | server, or perhaps check some of the other FAQs referenced above. |
125 | |
126 | =head2 How do I edit my .htpasswd and .htgroup files with Perl? |
127 | |
128 | The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a |
129 | consistent OO interface to these files, regardless of how they're |
130 | stored. Databases may be text, dbm, Berkley DB or any database with a |
131 | DBI compatible driver. HTTPD::UserAdmin supports files used by the |
132 | `Basic' and `Digest' authentication schemes. Here's an example: |
133 | |
134 | use HTTPD::UserAdmin (); |
135 | HTTPD::UserAdmin |
136 | ->new(DB => "/foo/.htpasswd") |
137 | ->add($username => $password); |
138 | |
139 | =head2 How do I parse an email header? |
140 | |
141 | For a quick-and-dirty solution, try this solution derived |
142 | from page 222 of the 2nd edition of "Programming Perl": |
143 | |
144 | $/ = ''; |
145 | $header = <MSG>; |
146 | $header =~ s/\n\s+/ /g; # merge continuation lines |
147 | %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header ); |
148 | |
149 | That solution doesn't do well if, for example, you're trying to |
150 | maintain all the Received lines. A more complete approach is to use |
151 | the Mail::Header module from CPAN (part of the MailTools package). |
152 | |
153 | =head2 How do I decode a CGI form? |
154 | |
155 | A lot of people are tempted to code this up themselves, so you've |
156 | probably all seen a lot of code involving C<$ENV{CONTENT_LENGTH}> and |
157 | C<$ENV{QUERY_STRING}>. It's true that this can work, but there are |
158 | also a lot of versions of this floating around that are quite simply |
159 | broken! |
160 | |
161 | Please do not be tempted to reinvent the wheel. Instead, use the |
162 | CGI.pm or CGI_Lite.pm (available from CPAN), or if you're trapped in |
163 | the module-free land of perl1 .. perl4, you might look into cgi-lib.pl |
164 | (available from http://www.bio.cam.ac.uk/web/form.html). |
165 | |
166 | =head2 How do I check a valid email address? |
167 | |
168 | You can't. |
169 | |
170 | Without sending mail to the address and seeing whether it bounces (and |
171 | even then you face the halting problem), you cannot determine whether |
172 | an email address is valid. Even if you apply the email header |
173 | standard, you can have problems, because there are deliverable |
174 | addresses that aren't RFC-822 (the mail header standard) compliant, |
175 | and addresses that aren't deliverable which are compliant. |
176 | |
177 | Many are tempted to try to eliminate many frequently-invalid email |
178 | addresses with a simple regexp, such as |
179 | C</^[\w.-]+\@([\w.-]\.)+\w+$/>. However, this also throws out many |
180 | valid ones, and says nothing about potential deliverability, so is not |
181 | suggested. Instead, see |
182 | http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz , |
183 | which actually checks against the full RFC spec (except for nested |
184 | comments), looks for addresses you may not wish to accept email to |
185 | (say, Bill Clinton or your postmaster), and then makes sure that the |
186 | hostname given can be looked up in DNS. It's not fast, but it works. |
187 | |
188 | =head2 How do I decode a MIME/BASE64 string? |
189 | |
190 | The MIME-tools package (available from CPAN) handles this and a lot |
191 | more. Decoding BASE64 becomes as simple as: |
192 | |
193 | use MIME::base64; |
194 | $decoded = decode_base64($encoded); |
195 | |
196 | A more direct approach is to use the unpack() function's "u" |
197 | format after minor transliterations: |
198 | |
199 | tr#A-Za-z0-9+/##cd; # remove non-base64 chars |
200 | tr#A-Za-z0-9+/# -_#; # convert to uuencoded format |
201 | $len = pack("c", 32 + 0.75*length); # compute length byte |
202 | print unpack("u", $len . $_); # uudecode and print |
203 | |
204 | =head2 How do I return the user's email address? |
205 | |
206 | On systems that support getpwuid, the $E<lt> variable and the |
207 | Sys::Hostname module (which is part of the standard perl distribution), |
208 | you can probably try using something like this: |
209 | |
210 | use Sys::Hostname; |
211 | $address = sprintf('%s@%s', getpwuid($<), hostname); |
212 | |
213 | Company policies on email address can mean that this generates addresses |
214 | that the company's email system will not accept, so you should ask for |
215 | users' email addresses when this matters. Furthermore, not all systems |
216 | on which Perl runs are so forthcoming with this information as is Unix. |
217 | |
218 | The Mail::Util module from CPAN (part of the MailTools package) provides a |
219 | mailaddress() function that tries to guess the mail address of the user. |
220 | It makes a more intelligent guess than the code above, using information |
221 | given when the module was installed, but it could still be incorrect. |
222 | Again, the best way is often just to ask the user. |
223 | |
224 | =head2 How do I send/read mail? |
225 | |
226 | Sending mail: the Mail::Mailer module from CPAN (part of the MailTools |
227 | package) is UNIX-centric, while Mail::Internet uses Net::SMTP which is |
228 | not UNIX-centric. Reading mail: use the Mail::Folder module from CPAN |
229 | (part of the MailFolder package) or the Mail::Internet module from |
230 | CPAN (also part of the MailTools package). |
231 | |
232 | =head2 How do I find out my hostname/domainname/IP address? |
233 | |
234 | A lot of code has historically cavalierly called the C<`hostname`> |
235 | program. While sometimes expedient, this isn't very portable. It's |
236 | one of those tradeoffs of convenience versus portability. |
237 | |
238 | The Sys::Hostname module (part of the standard perl distribution) will |
239 | give you the hostname after which you can find out the IP address |
240 | (assuming you have working DNS) with a gethostbyname() call. |
241 | |
242 | use Socket; |
243 | use Sys::Hostname; |
244 | my $host = hostname(); |
245 | my $addr = inet_ntoa(scalar(gethostbyname($name)) || 'localhost'); |
246 | |
247 | Probably the simplest way to learn your DNS domain name is to grok |
248 | it out of /etc/resolv.conf, at least under Unix. Of course, this |
249 | assumes several things about your resolv.conf configuration, including |
250 | that it exists. |
251 | |
252 | (We still need a good DNS domain name-learning method for non-Unix |
253 | systems.) |
254 | |
255 | =head2 How do I fetch a news article or the active newsgroups? |
256 | |
257 | Use the Net::NNTP or News::NNTPClient modules, both available from CPAN. |
258 | This can make tasks like fetching the newsgroup list as simple as: |
259 | |
260 | perl -MNews::NNTPClient |
261 | -e 'print News::NNTPClient->new->list("newsgroups")' |
262 | |
263 | =head2 How do I fetch/put an FTP file? |
264 | |
265 | LWP::Simple (available from CPAN) can fetch but not put. Net::FTP (also |
266 | available from CPAN) is more complex but can put as well as fetch. |
267 | |
268 | =head2 How can I do RPC in Perl? |
269 | |
270 | A DCE::RPC module is being developed (but is not yet available), and |
271 | will be released as part of the DCE-Perl package (available from |
272 | CPAN). No ONC::RPC module is known. |
273 | |
274 | =head1 AUTHOR AND COPYRIGHT |
275 | |
276 | Copyright (c) 1997 Tom Christiansen and Nathan Torkington. |
277 | All rights reserved. See L<perlfaq> for distribution information. |