Commit | Line | Data |
---|---|---|
360aca43 GS |
1 | ############################################################################# |
2 | # Pod/Parser.pm -- package which defines a base class for parsing POD docs. | |
3 | # | |
66aff6dd | 4 | # Copyright (C) 1996-2000 by Bradford Appleton. All rights reserved. |
360aca43 GS |
5 | # This file is part of "PodParser". PodParser is free software; |
6 | # you can redistribute it and/or modify it under the same terms | |
7 | # as Perl itself. | |
8 | ############################################################################# | |
9 | ||
10 | package Pod::Parser; | |
11 | ||
12 | use vars qw($VERSION); | |
267d5541 | 13 | $VERSION = 1.35; ## Current version of this package |
828c4421 | 14 | require 5.005; ## requires this Perl version or later |
360aca43 GS |
15 | |
16 | ############################################################################# | |
17 | ||
18 | =head1 NAME | |
19 | ||
20 | Pod::Parser - base class for creating POD filters and translators | |
21 | ||
22 | =head1 SYNOPSIS | |
23 | ||
24 | use Pod::Parser; | |
25 | ||
26 | package MyParser; | |
27 | @ISA = qw(Pod::Parser); | |
28 | ||
29 | sub command { | |
30 | my ($parser, $command, $paragraph, $line_num) = @_; | |
31 | ## Interpret the command and its text; sample actions might be: | |
32 | if ($command eq 'head1') { ... } | |
33 | elsif ($command eq 'head2') { ... } | |
34 | ## ... other commands and their actions | |
35 | my $out_fh = $parser->output_handle(); | |
36 | my $expansion = $parser->interpolate($paragraph, $line_num); | |
37 | print $out_fh $expansion; | |
38 | } | |
39 | ||
40 | sub verbatim { | |
41 | my ($parser, $paragraph, $line_num) = @_; | |
42 | ## Format verbatim paragraph; sample actions might be: | |
43 | my $out_fh = $parser->output_handle(); | |
44 | print $out_fh $paragraph; | |
45 | } | |
46 | ||
47 | sub textblock { | |
48 | my ($parser, $paragraph, $line_num) = @_; | |
49 | ## Translate/Format this block of text; sample actions might be: | |
50 | my $out_fh = $parser->output_handle(); | |
51 | my $expansion = $parser->interpolate($paragraph, $line_num); | |
52 | print $out_fh $expansion; | |
53 | } | |
54 | ||
55 | sub interior_sequence { | |
56 | my ($parser, $seq_command, $seq_argument) = @_; | |
57 | ## Expand an interior sequence; sample actions might be: | |
66aff6dd GS |
58 | return "*$seq_argument*" if ($seq_command eq 'B'); |
59 | return "`$seq_argument'" if ($seq_command eq 'C'); | |
60 | return "_${seq_argument}_'" if ($seq_command eq 'I'); | |
360aca43 GS |
61 | ## ... other sequence commands and their resulting text |
62 | } | |
63 | ||
64 | package main; | |
65 | ||
66 | ## Create a parser object and have it parse file whose name was | |
67 | ## given on the command-line (use STDIN if no files were given). | |
68 | $parser = new MyParser(); | |
69 | $parser->parse_from_filehandle(\*STDIN) if (@ARGV == 0); | |
70 | for (@ARGV) { $parser->parse_from_file($_); } | |
71 | ||
72 | =head1 REQUIRES | |
73 | ||
828c4421 | 74 | perl5.005, Pod::InputObjects, Exporter, Symbol, Carp |
360aca43 GS |
75 | |
76 | =head1 EXPORTS | |
77 | ||
78 | Nothing. | |
79 | ||
80 | =head1 DESCRIPTION | |
81 | ||
82 | B<Pod::Parser> is a base class for creating POD filters and translators. | |
83 | It handles most of the effort involved with parsing the POD sections | |
84 | from an input stream, leaving subclasses free to be concerned only with | |
85 | performing the actual translation of text. | |
86 | ||
87 | B<Pod::Parser> parses PODs, and makes method calls to handle the various | |
88 | components of the POD. Subclasses of B<Pod::Parser> override these methods | |
89 | to translate the POD into whatever output format they desire. | |
90 | ||
91 | =head1 QUICK OVERVIEW | |
92 | ||
93 | To create a POD filter for translating POD documentation into some other | |
94 | format, you create a subclass of B<Pod::Parser> which typically overrides | |
95 | just the base class implementation for the following methods: | |
96 | ||
97 | =over 2 | |
98 | ||
99 | =item * | |
100 | ||
101 | B<command()> | |
102 | ||
103 | =item * | |
104 | ||
105 | B<verbatim()> | |
106 | ||
107 | =item * | |
108 | ||
109 | B<textblock()> | |
110 | ||
111 | =item * | |
112 | ||
113 | B<interior_sequence()> | |
114 | ||
115 | =back | |
116 | ||
117 | You may also want to override the B<begin_input()> and B<end_input()> | |
118 | methods for your subclass (to perform any needed per-file and/or | |
119 | per-document initialization or cleanup). | |
120 | ||
7b47f8ec | 121 | If you need to perform any preprocesssing of input before it is parsed |
360aca43 GS |
122 | you may want to override one or more of B<preprocess_line()> and/or |
123 | B<preprocess_paragraph()>. | |
124 | ||
125 | Sometimes it may be necessary to make more than one pass over the input | |
126 | files. If this is the case you have several options. You can make the | |
127 | first pass using B<Pod::Parser> and override your methods to store the | |
128 | intermediate results in memory somewhere for the B<end_pod()> method to | |
129 | process. You could use B<Pod::Parser> for several passes with an | |
130 | appropriate state variable to control the operation for each pass. If | |
131 | your input source can't be reset to start at the beginning, you can | |
132 | store it in some other structure as a string or an array and have that | |
133 | structure implement a B<getline()> method (which is all that | |
134 | B<parse_from_filehandle()> uses to read input). | |
135 | ||
136 | Feel free to add any member data fields you need to keep track of things | |
137 | like current font, indentation, horizontal or vertical position, or | |
138 | whatever else you like. Be sure to read L<"PRIVATE METHODS AND DATA"> | |
139 | to avoid name collisions. | |
140 | ||
141 | For the most part, the B<Pod::Parser> base class should be able to | |
142 | do most of the input parsing for you and leave you free to worry about | |
267d5541 | 143 | how to interpret the commands and translate the result. |
360aca43 | 144 | |
66aff6dd GS |
145 | Note that all we have described here in this quick overview is the |
146 | simplest most straightforward use of B<Pod::Parser> to do stream-based | |
664bb207 GS |
147 | parsing. It is also possible to use the B<Pod::Parser::parse_text> function |
148 | to do more sophisticated tree-based parsing. See L<"TREE-BASED PARSING">. | |
149 | ||
150 | =head1 PARSING OPTIONS | |
151 | ||
152 | A I<parse-option> is simply a named option of B<Pod::Parser> with a | |
153 | value that corresponds to a certain specified behavior. These various | |
d1be9408 | 154 | behaviors of B<Pod::Parser> may be enabled/disabled by setting |
664bb207 GS |
155 | or unsetting one or more I<parse-options> using the B<parseopts()> method. |
156 | The set of currently accepted parse-options is as follows: | |
157 | ||
158 | =over 3 | |
159 | ||
160 | =item B<-want_nonPODs> (default: unset) | |
161 | ||
162 | Normally (by default) B<Pod::Parser> will only provide access to | |
163 | the POD sections of the input. Input paragraphs that are not part | |
164 | of the POD-format documentation are not made available to the caller | |
165 | (not even using B<preprocess_paragraph()>). Setting this option to a | |
166 | non-empty, non-zero value will allow B<preprocess_paragraph()> to see | |
e3237417 | 167 | non-POD sections of the input as well as POD sections. The B<cutting()> |
664bb207 GS |
168 | method can be used to determine if the corresponding paragraph is a POD |
169 | paragraph, or some other input paragraph. | |
170 | ||
171 | =item B<-process_cut_cmd> (default: unset) | |
172 | ||
173 | Normally (by default) B<Pod::Parser> handles the C<=cut> POD directive | |
174 | by itself and does not pass it on to the caller for processing. Setting | |
a5317591 | 175 | this option to a non-empty, non-zero value will cause B<Pod::Parser> to |
664bb207 GS |
176 | pass the C<=cut> directive to the caller just like any other POD command |
177 | (and hence it may be processed by the B<command()> method). | |
178 | ||
179 | B<Pod::Parser> will still interpret the C<=cut> directive to mean that | |
180 | "cutting mode" has been (re)entered, but the caller will get a chance | |
181 | to capture the actual C<=cut> paragraph itself for whatever purpose | |
182 | it desires. | |
183 | ||
a5317591 GS |
184 | =item B<-warnings> (default: unset) |
185 | ||
186 | Normally (by default) B<Pod::Parser> recognizes a bare minimum of | |
187 | pod syntax errors and warnings and issues diagnostic messages | |
188 | for errors, but not for warnings. (Use B<Pod::Checker> to do more | |
189 | thorough checking of POD syntax.) Setting this option to a non-empty, | |
190 | non-zero value will cause B<Pod::Parser> to issue diagnostics for | |
191 | the few warnings it recognizes as well as the errors. | |
192 | ||
664bb207 GS |
193 | =back |
194 | ||
195 | Please see L<"parseopts()"> for a complete description of the interface | |
196 | for the setting and unsetting of parse-options. | |
197 | ||
360aca43 GS |
198 | =cut |
199 | ||
200 | ############################################################################# | |
201 | ||
202 | use vars qw(@ISA); | |
203 | use strict; | |
204 | #use diagnostics; | |
205 | use Pod::InputObjects; | |
206 | use Carp; | |
360aca43 | 207 | use Exporter; |
828c4421 GS |
208 | BEGIN { |
209 | if ($] < 5.6) { | |
210 | require Symbol; | |
211 | import Symbol; | |
212 | } | |
213 | } | |
360aca43 GS |
214 | @ISA = qw(Exporter); |
215 | ||
216 | ## These "variables" are used as local "glob aliases" for performance | |
664bb207 | 217 | use vars qw(%myData %myOpts @input_stack); |
360aca43 GS |
218 | |
219 | ############################################################################# | |
220 | ||
221 | =head1 RECOMMENDED SUBROUTINE/METHOD OVERRIDES | |
222 | ||
223 | B<Pod::Parser> provides several methods which most subclasses will probably | |
224 | want to override. These methods are as follows: | |
225 | ||
226 | =cut | |
227 | ||
228 | ##--------------------------------------------------------------------------- | |
229 | ||
230 | =head1 B<command()> | |
231 | ||
232 | $parser->command($cmd,$text,$line_num,$pod_para); | |
233 | ||
234 | This method should be overridden by subclasses to take the appropriate | |
235 | action when a POD command paragraph (denoted by a line beginning with | |
236 | "=") is encountered. When such a POD directive is seen in the input, | |
237 | this method is called and is passed: | |
238 | ||
239 | =over 3 | |
240 | ||
241 | =item C<$cmd> | |
242 | ||
243 | the name of the command for this POD paragraph | |
244 | ||
245 | =item C<$text> | |
246 | ||
247 | the paragraph text for the given POD paragraph command. | |
248 | ||
249 | =item C<$line_num> | |
250 | ||
251 | the line-number of the beginning of the paragraph | |
252 | ||
253 | =item C<$pod_para> | |
254 | ||
255 | a reference to a C<Pod::Paragraph> object which contains further | |
256 | information about the paragraph command (see L<Pod::InputObjects> | |
257 | for details). | |
258 | ||
259 | =back | |
260 | ||
261 | B<Note> that this method I<is> called for C<=pod> paragraphs. | |
262 | ||
263 | The base class implementation of this method simply treats the raw POD | |
264 | command as normal block of paragraph text (invoking the B<textblock()> | |
265 | method with the command paragraph). | |
266 | ||
267 | =cut | |
268 | ||
269 | sub command { | |
270 | my ($self, $cmd, $text, $line_num, $pod_para) = @_; | |
271 | ## Just treat this like a textblock | |
272 | $self->textblock($pod_para->raw_text(), $line_num, $pod_para); | |
273 | } | |
274 | ||
275 | ##--------------------------------------------------------------------------- | |
276 | ||
277 | =head1 B<verbatim()> | |
278 | ||
279 | $parser->verbatim($text,$line_num,$pod_para); | |
280 | ||
281 | This method may be overridden by subclasses to take the appropriate | |
282 | action when a block of verbatim text is encountered. It is passed the | |
283 | following parameters: | |
284 | ||
285 | =over 3 | |
286 | ||
287 | =item C<$text> | |
288 | ||
289 | the block of text for the verbatim paragraph | |
290 | ||
291 | =item C<$line_num> | |
292 | ||
293 | the line-number of the beginning of the paragraph | |
294 | ||
295 | =item C<$pod_para> | |
296 | ||
297 | a reference to a C<Pod::Paragraph> object which contains further | |
298 | information about the paragraph (see L<Pod::InputObjects> | |
299 | for details). | |
300 | ||
301 | =back | |
302 | ||
303 | The base class implementation of this method simply prints the textblock | |
304 | (unmodified) to the output filehandle. | |
305 | ||
306 | =cut | |
307 | ||
308 | sub verbatim { | |
309 | my ($self, $text, $line_num, $pod_para) = @_; | |
310 | my $out_fh = $self->{_OUTPUT}; | |
311 | print $out_fh $text; | |
312 | } | |
313 | ||
314 | ##--------------------------------------------------------------------------- | |
315 | ||
316 | =head1 B<textblock()> | |
317 | ||
318 | $parser->textblock($text,$line_num,$pod_para); | |
319 | ||
320 | This method may be overridden by subclasses to take the appropriate | |
321 | action when a normal block of POD text is encountered (although the base | |
322 | class method will usually do what you want). It is passed the following | |
323 | parameters: | |
324 | ||
325 | =over 3 | |
326 | ||
327 | =item C<$text> | |
328 | ||
329 | the block of text for the a POD paragraph | |
330 | ||
331 | =item C<$line_num> | |
332 | ||
333 | the line-number of the beginning of the paragraph | |
334 | ||
335 | =item C<$pod_para> | |
336 | ||
337 | a reference to a C<Pod::Paragraph> object which contains further | |
338 | information about the paragraph (see L<Pod::InputObjects> | |
339 | for details). | |
340 | ||
341 | =back | |
342 | ||
343 | In order to process interior sequences, subclasses implementations of | |
344 | this method will probably want to invoke either B<interpolate()> or | |
345 | B<parse_text()>, passing it the text block C<$text>, and the corresponding | |
346 | line number in C<$line_num>, and then perform any desired processing upon | |
347 | the returned result. | |
348 | ||
349 | The base class implementation of this method simply prints the text block | |
350 | as it occurred in the input stream). | |
351 | ||
352 | =cut | |
353 | ||
354 | sub textblock { | |
355 | my ($self, $text, $line_num, $pod_para) = @_; | |
356 | my $out_fh = $self->{_OUTPUT}; | |
357 | print $out_fh $self->interpolate($text, $line_num); | |
358 | } | |
359 | ||
360 | ##--------------------------------------------------------------------------- | |
361 | ||
362 | =head1 B<interior_sequence()> | |
363 | ||
364 | $parser->interior_sequence($seq_cmd,$seq_arg,$pod_seq); | |
365 | ||
366 | This method should be overridden by subclasses to take the appropriate | |
367 | action when an interior sequence is encountered. An interior sequence is | |
368 | an embedded command within a block of text which appears as a command | |
369 | name (usually a single uppercase character) followed immediately by a | |
370 | string of text which is enclosed in angle brackets. This method is | |
371 | passed the sequence command C<$seq_cmd> and the corresponding text | |
372 | C<$seq_arg>. It is invoked by the B<interpolate()> method for each interior | |
373 | sequence that occurs in the string that it is passed. It should return | |
374 | the desired text string to be used in place of the interior sequence. | |
375 | The C<$pod_seq> argument is a reference to a C<Pod::InteriorSequence> | |
376 | object which contains further information about the interior sequence. | |
377 | Please see L<Pod::InputObjects> for details if you need to access this | |
378 | additional information. | |
379 | ||
380 | Subclass implementations of this method may wish to invoke the | |
381 | B<nested()> method of C<$pod_seq> to see if it is nested inside | |
382 | some other interior-sequence (and if so, which kind). | |
383 | ||
384 | The base class implementation of the B<interior_sequence()> method | |
385 | simply returns the raw text of the interior sequence (as it occurred | |
386 | in the input) to the caller. | |
387 | ||
388 | =cut | |
389 | ||
390 | sub interior_sequence { | |
391 | my ($self, $seq_cmd, $seq_arg, $pod_seq) = @_; | |
392 | ## Just return the raw text of the interior sequence | |
393 | return $pod_seq->raw_text(); | |
394 | } | |
395 | ||
396 | ############################################################################# | |
397 | ||
398 | =head1 OPTIONAL SUBROUTINE/METHOD OVERRIDES | |
399 | ||
400 | B<Pod::Parser> provides several methods which subclasses may want to override | |
401 | to perform any special pre/post-processing. These methods do I<not> have to | |
402 | be overridden, but it may be useful for subclasses to take advantage of them. | |
403 | ||
404 | =cut | |
405 | ||
406 | ##--------------------------------------------------------------------------- | |
407 | ||
408 | =head1 B<new()> | |
409 | ||
410 | my $parser = Pod::Parser->new(); | |
411 | ||
412 | This is the constructor for B<Pod::Parser> and its subclasses. You | |
413 | I<do not> need to override this method! It is capable of constructing | |
414 | subclass objects as well as base class objects, provided you use | |
415 | any of the following constructor invocation styles: | |
416 | ||
417 | my $parser1 = MyParser->new(); | |
418 | my $parser2 = new MyParser(); | |
419 | my $parser3 = $parser2->new(); | |
420 | ||
421 | where C<MyParser> is some subclass of B<Pod::Parser>. | |
422 | ||
423 | Using the syntax C<MyParser::new()> to invoke the constructor is I<not> | |
424 | recommended, but if you insist on being able to do this, then the | |
425 | subclass I<will> need to override the B<new()> constructor method. If | |
426 | you do override the constructor, you I<must> be sure to invoke the | |
427 | B<initialize()> method of the newly blessed object. | |
428 | ||
429 | Using any of the above invocations, the first argument to the | |
430 | constructor is always the corresponding package name (or object | |
431 | reference). No other arguments are required, but if desired, an | |
432 | associative array (or hash-table) my be passed to the B<new()> | |
433 | constructor, as in: | |
434 | ||
435 | my $parser1 = MyParser->new( MYDATA => $value1, MOREDATA => $value2 ); | |
436 | my $parser2 = new MyParser( -myflag => 1 ); | |
437 | ||
438 | All arguments passed to the B<new()> constructor will be treated as | |
439 | key/value pairs in a hash-table. The newly constructed object will be | |
440 | initialized by copying the contents of the given hash-table (which may | |
441 | have been empty). The B<new()> constructor for this class and all of its | |
442 | subclasses returns a blessed reference to the initialized object (hash-table). | |
443 | ||
444 | =cut | |
445 | ||
446 | sub new { | |
447 | ## Determine if we were called via an object-ref or a classname | |
448 | my $this = shift; | |
449 | my $class = ref($this) || $this; | |
450 | ## Any remaining arguments are treated as initial values for the | |
451 | ## hash that is used to represent this object. | |
452 | my %params = @_; | |
453 | my $self = { %params }; | |
454 | ## Bless ourselves into the desired class and perform any initialization | |
455 | bless $self, $class; | |
456 | $self->initialize(); | |
457 | return $self; | |
458 | } | |
459 | ||
460 | ##--------------------------------------------------------------------------- | |
461 | ||
462 | =head1 B<initialize()> | |
463 | ||
464 | $parser->initialize(); | |
465 | ||
466 | This method performs any necessary object initialization. It takes no | |
467 | arguments (other than the object instance of course, which is typically | |
468 | copied to a local variable named C<$self>). If subclasses override this | |
469 | method then they I<must> be sure to invoke C<$self-E<gt>SUPER::initialize()>. | |
470 | ||
471 | =cut | |
472 | ||
473 | sub initialize { | |
474 | #my $self = shift; | |
475 | #return; | |
476 | } | |
477 | ||
478 | ##--------------------------------------------------------------------------- | |
479 | ||
480 | =head1 B<begin_pod()> | |
481 | ||
482 | $parser->begin_pod(); | |
483 | ||
484 | This method is invoked at the beginning of processing for each POD | |
485 | document that is encountered in the input. Subclasses should override | |
486 | this method to perform any per-document initialization. | |
487 | ||
488 | =cut | |
489 | ||
490 | sub begin_pod { | |
491 | #my $self = shift; | |
492 | #return; | |
493 | } | |
494 | ||
495 | ##--------------------------------------------------------------------------- | |
496 | ||
497 | =head1 B<begin_input()> | |
498 | ||
499 | $parser->begin_input(); | |
500 | ||
501 | This method is invoked by B<parse_from_filehandle()> immediately I<before> | |
502 | processing input from a filehandle. The base class implementation does | |
503 | nothing, however, subclasses may override it to perform any per-file | |
504 | initializations. | |
505 | ||
506 | Note that if multiple files are parsed for a single POD document | |
507 | (perhaps the result of some future C<=include> directive) this method | |
508 | is invoked for every file that is parsed. If you wish to perform certain | |
509 | initializations once per document, then you should use B<begin_pod()>. | |
510 | ||
511 | =cut | |
512 | ||
513 | sub begin_input { | |
514 | #my $self = shift; | |
515 | #return; | |
516 | } | |
517 | ||
518 | ##--------------------------------------------------------------------------- | |
519 | ||
520 | =head1 B<end_input()> | |
521 | ||
522 | $parser->end_input(); | |
523 | ||
524 | This method is invoked by B<parse_from_filehandle()> immediately I<after> | |
525 | processing input from a filehandle. The base class implementation does | |
526 | nothing, however, subclasses may override it to perform any per-file | |
527 | cleanup actions. | |
528 | ||
529 | Please note that if multiple files are parsed for a single POD document | |
530 | (perhaps the result of some kind of C<=include> directive) this method | |
531 | is invoked for every file that is parsed. If you wish to perform certain | |
532 | cleanup actions once per document, then you should use B<end_pod()>. | |
533 | ||
534 | =cut | |
535 | ||
536 | sub end_input { | |
537 | #my $self = shift; | |
538 | #return; | |
539 | } | |
540 | ||
541 | ##--------------------------------------------------------------------------- | |
542 | ||
543 | =head1 B<end_pod()> | |
544 | ||
545 | $parser->end_pod(); | |
546 | ||
547 | This method is invoked at the end of processing for each POD document | |
548 | that is encountered in the input. Subclasses should override this method | |
549 | to perform any per-document finalization. | |
550 | ||
551 | =cut | |
552 | ||
553 | sub end_pod { | |
554 | #my $self = shift; | |
555 | #return; | |
556 | } | |
557 | ||
558 | ##--------------------------------------------------------------------------- | |
559 | ||
560 | =head1 B<preprocess_line()> | |
561 | ||
562 | $textline = $parser->preprocess_line($text, $line_num); | |
563 | ||
564 | This method should be overridden by subclasses that wish to perform | |
565 | any kind of preprocessing for each I<line> of input (I<before> it has | |
566 | been determined whether or not it is part of a POD paragraph). The | |
567 | parameter C<$text> is the input line; and the parameter C<$line_num> is | |
568 | the line number of the corresponding text line. | |
569 | ||
570 | The value returned should correspond to the new text to use in its | |
571 | place. If the empty string or an undefined value is returned then no | |
572 | further processing will be performed for this line. | |
573 | ||
574 | Please note that the B<preprocess_line()> method is invoked I<before> | |
575 | the B<preprocess_paragraph()> method. After all (possibly preprocessed) | |
576 | lines in a paragraph have been assembled together and it has been | |
577 | determined that the paragraph is part of the POD documentation from one | |
578 | of the selected sections, then B<preprocess_paragraph()> is invoked. | |
579 | ||
580 | The base class implementation of this method returns the given text. | |
581 | ||
582 | =cut | |
583 | ||
584 | sub preprocess_line { | |
585 | my ($self, $text, $line_num) = @_; | |
586 | return $text; | |
587 | } | |
588 | ||
589 | ##--------------------------------------------------------------------------- | |
590 | ||
591 | =head1 B<preprocess_paragraph()> | |
592 | ||
593 | $textblock = $parser->preprocess_paragraph($text, $line_num); | |
594 | ||
595 | This method should be overridden by subclasses that wish to perform any | |
596 | kind of preprocessing for each block (paragraph) of POD documentation | |
597 | that appears in the input stream. The parameter C<$text> is the POD | |
598 | paragraph from the input file; and the parameter C<$line_num> is the | |
599 | line number for the beginning of the corresponding paragraph. | |
600 | ||
601 | The value returned should correspond to the new text to use in its | |
602 | place If the empty string is returned or an undefined value is | |
603 | returned, then the given C<$text> is ignored (not processed). | |
604 | ||
e3237417 GS |
605 | This method is invoked after gathering up all the lines in a paragraph |
606 | and after determining the cutting state of the paragraph, | |
360aca43 GS |
607 | but before trying to further parse or interpret them. After |
608 | B<preprocess_paragraph()> returns, the current cutting state (which | |
609 | is returned by C<$self-E<gt>cutting()>) is examined. If it evaluates | |
e3237417 | 610 | to true then input text (including the given C<$text>) is cut (not |
360aca43 GS |
611 | processed) until the next POD directive is encountered. |
612 | ||
613 | Please note that the B<preprocess_line()> method is invoked I<before> | |
614 | the B<preprocess_paragraph()> method. After all (possibly preprocessed) | |
e3237417 | 615 | lines in a paragraph have been assembled together and either it has been |
360aca43 | 616 | determined that the paragraph is part of the POD documentation from one |
66aff6dd | 617 | of the selected sections or the C<-want_nonPODs> option is true, |
e3237417 | 618 | then B<preprocess_paragraph()> is invoked. |
360aca43 GS |
619 | |
620 | The base class implementation of this method returns the given text. | |
621 | ||
622 | =cut | |
623 | ||
624 | sub preprocess_paragraph { | |
625 | my ($self, $text, $line_num) = @_; | |
626 | return $text; | |
627 | } | |
628 | ||
629 | ############################################################################# | |
630 | ||
631 | =head1 METHODS FOR PARSING AND PROCESSING | |
632 | ||
633 | B<Pod::Parser> provides several methods to process input text. These | |
664bb207 GS |
634 | methods typically won't need to be overridden (and in some cases they |
635 | can't be overridden), but subclasses may want to invoke them to exploit | |
636 | their functionality. | |
360aca43 GS |
637 | |
638 | =cut | |
639 | ||
640 | ##--------------------------------------------------------------------------- | |
641 | ||
642 | =head1 B<parse_text()> | |
643 | ||
644 | $ptree1 = $parser->parse_text($text, $line_num); | |
645 | $ptree2 = $parser->parse_text({%opts}, $text, $line_num); | |
646 | $ptree3 = $parser->parse_text(\%opts, $text, $line_num); | |
647 | ||
648 | This method is useful if you need to perform your own interpolation | |
649 | of interior sequences and can't rely upon B<interpolate> to expand | |
d1be9408 | 650 | them in simple bottom-up order. |
360aca43 GS |
651 | |
652 | The parameter C<$text> is a string or block of text to be parsed | |
653 | for interior sequences; and the parameter C<$line_num> is the | |
267d5541 | 654 | line number corresponding to the beginning of C<$text>. |
360aca43 GS |
655 | |
656 | B<parse_text()> will parse the given text into a parse-tree of "nodes." | |
657 | and interior-sequences. Each "node" in the parse tree is either a | |
658 | text-string, or a B<Pod::InteriorSequence>. The result returned is a | |
659 | parse-tree of type B<Pod::ParseTree>. Please see L<Pod::InputObjects> | |
660 | for more information about B<Pod::InteriorSequence> and B<Pod::ParseTree>. | |
661 | ||
662 | If desired, an optional hash-ref may be specified as the first argument | |
663 | to customize certain aspects of the parse-tree that is created and | |
664 | returned. The set of recognized option keywords are: | |
665 | ||
666 | =over 3 | |
667 | ||
668 | =item B<-expand_seq> =E<gt> I<code-ref>|I<method-name> | |
669 | ||
670 | Normally, the parse-tree returned by B<parse_text()> will contain an | |
671 | unexpanded C<Pod::InteriorSequence> object for each interior-sequence | |
672 | encountered. Specifying B<-expand_seq> tells B<parse_text()> to "expand" | |
673 | every interior-sequence it sees by invoking the referenced function | |
674 | (or named method of the parser object) and using the return value as the | |
675 | expanded result. | |
676 | ||
677 | If a subroutine reference was given, it is invoked as: | |
678 | ||
679 | &$code_ref( $parser, $sequence ) | |
680 | ||
681 | and if a method-name was given, it is invoked as: | |
682 | ||
683 | $parser->method_name( $sequence ) | |
684 | ||
685 | where C<$parser> is a reference to the parser object, and C<$sequence> | |
686 | is a reference to the interior-sequence object. | |
687 | [I<NOTE>: If the B<interior_sequence()> method is specified, then it is | |
688 | invoked according to the interface specified in L<"interior_sequence()">]. | |
689 | ||
664bb207 GS |
690 | =item B<-expand_text> =E<gt> I<code-ref>|I<method-name> |
691 | ||
692 | Normally, the parse-tree returned by B<parse_text()> will contain a | |
693 | text-string for each contiguous sequence of characters outside of an | |
694 | interior-sequence. Specifying B<-expand_text> tells B<parse_text()> to | |
695 | "preprocess" every such text-string it sees by invoking the referenced | |
696 | function (or named method of the parser object) and using the return value | |
697 | as the preprocessed (or "expanded") result. [Note that if the result is | |
698 | an interior-sequence, then it will I<not> be expanded as specified by the | |
699 | B<-expand_seq> option; Any such recursive expansion needs to be handled by | |
700 | the specified callback routine.] | |
701 | ||
702 | If a subroutine reference was given, it is invoked as: | |
703 | ||
704 | &$code_ref( $parser, $text, $ptree_node ) | |
705 | ||
706 | and if a method-name was given, it is invoked as: | |
707 | ||
708 | $parser->method_name( $text, $ptree_node ) | |
709 | ||
710 | where C<$parser> is a reference to the parser object, C<$text> is the | |
711 | text-string encountered, and C<$ptree_node> is a reference to the current | |
712 | node in the parse-tree (usually an interior-sequence object or else the | |
713 | top-level node of the parse-tree). | |
714 | ||
360aca43 GS |
715 | =item B<-expand_ptree> =E<gt> I<code-ref>|I<method-name> |
716 | ||
717 | Rather than returning a C<Pod::ParseTree>, pass the parse-tree as an | |
718 | argument to the referenced subroutine (or named method of the parser | |
719 | object) and return the result instead of the parse-tree object. | |
720 | ||
721 | If a subroutine reference was given, it is invoked as: | |
722 | ||
723 | &$code_ref( $parser, $ptree ) | |
724 | ||
725 | and if a method-name was given, it is invoked as: | |
726 | ||
727 | $parser->method_name( $ptree ) | |
728 | ||
729 | where C<$parser> is a reference to the parser object, and C<$ptree> | |
730 | is a reference to the parse-tree object. | |
731 | ||
732 | =back | |
733 | ||
734 | =cut | |
735 | ||
360aca43 GS |
736 | sub parse_text { |
737 | my $self = shift; | |
738 | local $_ = ''; | |
739 | ||
740 | ## Get options and set any defaults | |
741 | my %opts = (ref $_[0]) ? %{ shift() } : (); | |
742 | my $expand_seq = $opts{'-expand_seq'} || undef; | |
664bb207 | 743 | my $expand_text = $opts{'-expand_text'} || undef; |
360aca43 GS |
744 | my $expand_ptree = $opts{'-expand_ptree'} || undef; |
745 | ||
746 | my $text = shift; | |
747 | my $line = shift; | |
748 | my $file = $self->input_file(); | |
66aff6dd | 749 | my $cmd = ""; |
360aca43 GS |
750 | |
751 | ## Convert method calls into closures, for our convenience | |
752 | my $xseq_sub = $expand_seq; | |
664bb207 | 753 | my $xtext_sub = $expand_text; |
360aca43 | 754 | my $xptree_sub = $expand_ptree; |
e9fdc7d2 | 755 | if (defined $expand_seq and $expand_seq eq 'interior_sequence') { |
360aca43 GS |
756 | ## If 'interior_sequence' is the method to use, we have to pass |
757 | ## more than just the sequence object, we also need to pass the | |
758 | ## sequence name and text. | |
759 | $xseq_sub = sub { | |
760 | my ($self, $iseq) = @_; | |
761 | my $args = join("", $iseq->parse_tree->children); | |
762 | return $self->interior_sequence($iseq->name, $args, $iseq); | |
763 | }; | |
764 | } | |
765 | ref $xseq_sub or $xseq_sub = sub { shift()->$expand_seq(@_) }; | |
664bb207 | 766 | ref $xtext_sub or $xtext_sub = sub { shift()->$expand_text(@_) }; |
360aca43 | 767 | ref $xptree_sub or $xptree_sub = sub { shift()->$expand_ptree(@_) }; |
66aff6dd | 768 | |
360aca43 GS |
769 | ## Keep track of the "current" interior sequence, and maintain a stack |
770 | ## of "in progress" sequences. | |
771 | ## | |
772 | ## NOTE that we push our own "accumulator" at the very beginning of the | |
773 | ## stack. It's really a parse-tree, not a sequence; but it implements | |
774 | ## the methods we need so we can use it to gather-up all the sequences | |
775 | ## and strings we parse. Thus, by the end of our parsing, it should be | |
776 | ## the only thing left on our stack and all we have to do is return it! | |
777 | ## | |
778 | my $seq = Pod::ParseTree->new(); | |
779 | my @seq_stack = ($seq); | |
66aff6dd | 780 | my ($ldelim, $rdelim) = ('', ''); |
360aca43 | 781 | |
faee740f GS |
782 | ## Iterate over all sequence starts text (NOTE: split with |
783 | ## capturing parens keeps the delimiters) | |
360aca43 | 784 | $_ = $text; |
39a52d2c | 785 | my @tokens = split /([A-Z]<(?:<+\s)?)/; |
66aff6dd GS |
786 | while ( @tokens ) { |
787 | $_ = shift @tokens; | |
faee740f | 788 | ## Look for the beginning of a sequence |
39a52d2c | 789 | if ( /^([A-Z])(<(?:<+\s)?)$/ ) { |
e9fdc7d2 | 790 | ## Push a new sequence onto the stack of those "in-progress" |
c23d1eb0 MR |
791 | my $ldelim_orig; |
792 | ($cmd, $ldelim_orig) = ($1, $2); | |
793 | ($ldelim = $ldelim_orig) =~ s/\s+$//; | |
794 | ($rdelim = $ldelim) =~ tr/</>/; | |
360aca43 | 795 | $seq = Pod::InteriorSequence->new( |
66aff6dd | 796 | -name => $cmd, |
c23d1eb0 | 797 | -ldelim => $ldelim_orig, -rdelim => $rdelim, |
66aff6dd | 798 | -file => $file, -line => $line |
360aca43 GS |
799 | ); |
800 | (@seq_stack > 1) and $seq->nested($seq_stack[-1]); | |
801 | push @seq_stack, $seq; | |
802 | } | |
66aff6dd GS |
803 | ## Look for sequence ending |
804 | elsif ( @seq_stack > 1 ) { | |
805 | ## Make sure we match the right kind of closing delimiter | |
806 | my ($seq_end, $post_seq) = ("", ""); | |
807 | if ( ($ldelim eq '<' and /\A(.*?)(>)/s) | |
808 | or /\A(.*?)(\s+$rdelim)/s ) | |
809 | { | |
810 | ## Found end-of-sequence, capture the interior and the | |
811 | ## closing the delimiter, and put the rest back on the | |
812 | ## token-list | |
813 | $post_seq = substr($_, length($1) + length($2)); | |
814 | ($_, $seq_end) = ($1, $2); | |
815 | (length $post_seq) and unshift @tokens, $post_seq; | |
816 | } | |
817 | if (length) { | |
818 | ## In the middle of a sequence, append this text to it, and | |
819 | ## dont forget to "expand" it if that's what the caller wanted | |
820 | $seq->append($expand_text ? &$xtext_sub($self,$_,$seq) : $_); | |
821 | $_ .= $seq_end; | |
822 | } | |
823 | if (length $seq_end) { | |
824 | ## End of current sequence, record terminating delimiter | |
825 | $seq->rdelim($seq_end); | |
826 | ## Pop it off the stack of "in progress" sequences | |
827 | pop @seq_stack; | |
828 | ## Append result to its parent in current parse tree | |
829 | $seq_stack[-1]->append($expand_seq ? &$xseq_sub($self,$seq) | |
830 | : $seq); | |
831 | ## Remember the current cmd-name and left-delimiter | |
c23d1eb0 MR |
832 | if(@seq_stack > 1) { |
833 | $cmd = $seq_stack[-1]->name; | |
834 | $ldelim = $seq_stack[-1]->ldelim; | |
835 | $rdelim = $seq_stack[-1]->rdelim; | |
836 | } else { | |
837 | $cmd = $ldelim = $rdelim = ''; | |
838 | } | |
66aff6dd | 839 | } |
360aca43 | 840 | } |
664bb207 GS |
841 | elsif (length) { |
842 | ## In the middle of a sequence, append this text to it, and | |
843 | ## dont forget to "expand" it if that's what the caller wanted | |
844 | $seq->append($expand_text ? &$xtext_sub($self,$_,$seq) : $_); | |
360aca43 | 845 | } |
66aff6dd | 846 | ## Keep track of line count |
267d5541 | 847 | $line += s/\r*\n//; |
66aff6dd GS |
848 | ## Remember the "current" sequence |
849 | $seq = $seq_stack[-1]; | |
360aca43 GS |
850 | } |
851 | ||
852 | ## Handle unterminated sequences | |
664bb207 | 853 | my $errorsub = (@seq_stack > 1) ? $self->errorsub() : undef; |
360aca43 GS |
854 | while (@seq_stack > 1) { |
855 | ($cmd, $file, $line) = ($seq->name, $seq->file_line); | |
66aff6dd GS |
856 | $ldelim = $seq->ldelim; |
857 | ($rdelim = $ldelim) =~ tr/</>/; | |
858 | $rdelim =~ s/^(\S+)(\s*)$/$2$1/; | |
360aca43 | 859 | pop @seq_stack; |
a5317591 | 860 | my $errmsg = "*** ERROR: unterminated ${cmd}${ldelim}...${rdelim}". |
66aff6dd | 861 | " at line $line in file $file\n"; |
664bb207 | 862 | (ref $errorsub) and &{$errorsub}($errmsg) |
f5daac4a | 863 | or (defined $errorsub) and $self->$errorsub($errmsg) |
664bb207 | 864 | or warn($errmsg); |
360aca43 GS |
865 | $seq_stack[-1]->append($expand_seq ? &$xseq_sub($self,$seq) : $seq); |
866 | $seq = $seq_stack[-1]; | |
867 | } | |
868 | ||
869 | ## Return the resulting parse-tree | |
870 | my $ptree = (pop @seq_stack)->parse_tree; | |
871 | return $expand_ptree ? &$xptree_sub($self, $ptree) : $ptree; | |
872 | } | |
873 | ||
874 | ##--------------------------------------------------------------------------- | |
875 | ||
876 | =head1 B<interpolate()> | |
877 | ||
878 | $textblock = $parser->interpolate($text, $line_num); | |
879 | ||
880 | This method translates all text (including any embedded interior sequences) | |
881 | in the given text string C<$text> and returns the interpolated result. The | |
882 | parameter C<$line_num> is the line number corresponding to the beginning | |
883 | of C<$text>. | |
884 | ||
885 | B<interpolate()> merely invokes a private method to recursively expand | |
886 | nested interior sequences in bottom-up order (innermost sequences are | |
887 | expanded first). If there is a need to expand nested sequences in | |
888 | some alternate order, use B<parse_text> instead. | |
889 | ||
890 | =cut | |
891 | ||
892 | sub interpolate { | |
893 | my($self, $text, $line_num) = @_; | |
894 | my %parse_opts = ( -expand_seq => 'interior_sequence' ); | |
895 | my $ptree = $self->parse_text( \%parse_opts, $text, $line_num ); | |
896 | return join "", $ptree->children(); | |
897 | } | |
898 | ||
899 | ##--------------------------------------------------------------------------- | |
900 | ||
901 | =begin __PRIVATE__ | |
902 | ||
903 | =head1 B<parse_paragraph()> | |
904 | ||
905 | $parser->parse_paragraph($text, $line_num); | |
906 | ||
907 | This method takes the text of a POD paragraph to be processed, along | |
908 | with its corresponding line number, and invokes the appropriate method | |
909 | (one of B<command()>, B<verbatim()>, or B<textblock()>). | |
910 | ||
664bb207 GS |
911 | For performance reasons, this method is invoked directly without any |
912 | dynamic lookup; Hence subclasses may I<not> override it! | |
360aca43 GS |
913 | |
914 | =end __PRIVATE__ | |
915 | ||
916 | =cut | |
917 | ||
918 | sub parse_paragraph { | |
919 | my ($self, $text, $line_num) = @_; | |
664bb207 GS |
920 | local *myData = $self; ## alias to avoid deref-ing overhead |
921 | local *myOpts = ($myData{_PARSEOPTS} ||= {}); ## get parse-options | |
360aca43 GS |
922 | local $_; |
923 | ||
664bb207 | 924 | ## See if we want to preprocess nonPOD paragraphs as well as POD ones. |
e3237417 GS |
925 | my $wantNonPods = $myOpts{'-want_nonPODs'}; |
926 | ||
927 | ## Update cutting status | |
928 | $myData{_CUTTING} = 0 if $text =~ /^={1,2}\S/; | |
664bb207 GS |
929 | |
930 | ## Perform any desired preprocessing if we wanted it this early | |
931 | $wantNonPods and $text = $self->preprocess_paragraph($text, $line_num); | |
932 | ||
360aca43 | 933 | ## Ignore up until next POD directive if we are cutting |
e3237417 | 934 | return if $myData{_CUTTING}; |
360aca43 GS |
935 | |
936 | ## Now we know this is block of text in a POD section! | |
937 | ||
938 | ##----------------------------------------------------------------- | |
939 | ## This is a hook (hack ;-) for Pod::Select to do its thing without | |
940 | ## having to override methods, but also without Pod::Parser assuming | |
941 | ## $self is an instance of Pod::Select (if the _SELECTED_SECTIONS | |
942 | ## field exists then we assume there is an is_selected() method for | |
943 | ## us to invoke (calling $self->can('is_selected') could verify this | |
944 | ## but that is more overhead than I want to incur) | |
945 | ##----------------------------------------------------------------- | |
946 | ||
947 | ## Ignore this block if it isnt in one of the selected sections | |
948 | if (exists $myData{_SELECTED_SECTIONS}) { | |
949 | $self->is_selected($text) or return ($myData{_CUTTING} = 1); | |
950 | } | |
951 | ||
664bb207 GS |
952 | ## If we havent already, perform any desired preprocessing and |
953 | ## then re-check the "cutting" state | |
954 | unless ($wantNonPods) { | |
955 | $text = $self->preprocess_paragraph($text, $line_num); | |
956 | return 1 unless ((defined $text) and (length $text)); | |
957 | return 1 if ($myData{_CUTTING}); | |
958 | } | |
360aca43 GS |
959 | |
960 | ## Look for one of the three types of paragraphs | |
961 | my ($pfx, $cmd, $arg, $sep) = ('', '', '', ''); | |
962 | my $pod_para = undef; | |
963 | if ($text =~ /^(={1,2})(?=\S)/) { | |
964 | ## Looks like a command paragraph. Capture the command prefix used | |
965 | ## ("=" or "=="), as well as the command-name, its paragraph text, | |
966 | ## and whatever sequence of characters was used to separate them | |
967 | $pfx = $1; | |
968 | $_ = substr($text, length $pfx); | |
d23ed1f2 | 969 | ($cmd, $sep, $text) = split /(\s+)/, $_, 2; |
360aca43 GS |
970 | ## If this is a "cut" directive then we dont need to do anything |
971 | ## except return to "cutting" mode. | |
972 | if ($cmd eq 'cut') { | |
973 | $myData{_CUTTING} = 1; | |
664bb207 | 974 | return unless $myOpts{'-process_cut_cmd'}; |
360aca43 GS |
975 | } |
976 | } | |
977 | ## Save the attributes indicating how the command was specified. | |
978 | $pod_para = new Pod::Paragraph( | |
979 | -name => $cmd, | |
980 | -text => $text, | |
981 | -prefix => $pfx, | |
982 | -separator => $sep, | |
983 | -file => $myData{_INFILE}, | |
984 | -line => $line_num | |
985 | ); | |
986 | # ## Invoke appropriate callbacks | |
987 | # if (exists $myData{_CALLBACKS}) { | |
988 | # ## Look through the callback list, invoke callbacks, | |
989 | # ## then see if we need to do the default actions | |
990 | # ## (invoke_callbacks will return true if we do). | |
991 | # return 1 unless $self->invoke_callbacks($cmd, $text, $line_num, $pod_para); | |
992 | # } | |
993 | if (length $cmd) { | |
994 | ## A command paragraph | |
995 | $self->command($cmd, $text, $line_num, $pod_para); | |
996 | } | |
997 | elsif ($text =~ /^\s+/) { | |
998 | ## Indented text - must be a verbatim paragraph | |
999 | $self->verbatim($text, $line_num, $pod_para); | |
1000 | } | |
1001 | else { | |
1002 | ## Looks like an ordinary block of text | |
1003 | $self->textblock($text, $line_num, $pod_para); | |
1004 | } | |
1005 | return 1; | |
1006 | } | |
1007 | ||
1008 | ##--------------------------------------------------------------------------- | |
1009 | ||
1010 | =head1 B<parse_from_filehandle()> | |
1011 | ||
1012 | $parser->parse_from_filehandle($in_fh,$out_fh); | |
1013 | ||
1014 | This method takes an input filehandle (which is assumed to already be | |
1015 | opened for reading) and reads the entire input stream looking for blocks | |
1016 | (paragraphs) of POD documentation to be processed. If no first argument | |
1017 | is given the default input filehandle C<STDIN> is used. | |
1018 | ||
1019 | The C<$in_fh> parameter may be any object that provides a B<getline()> | |
1020 | method to retrieve a single line of input text (hence, an appropriate | |
1021 | wrapper object could be used to parse PODs from a single string or an | |
1022 | array of strings). | |
1023 | ||
1024 | Using C<$in_fh-E<gt>getline()>, input is read line-by-line and assembled | |
1025 | into paragraphs or "blocks" (which are separated by lines containing | |
1026 | nothing but whitespace). For each block of POD documentation | |
1027 | encountered it will invoke a method to parse the given paragraph. | |
1028 | ||
1029 | If a second argument is given then it should correspond to a filehandle where | |
1030 | output should be sent (otherwise the default output filehandle is | |
1031 | C<STDOUT> if no output filehandle is currently in use). | |
1032 | ||
1033 | B<NOTE:> For performance reasons, this method caches the input stream at | |
1034 | the top of the stack in a local variable. Any attempts by clients to | |
1035 | change the stack contents during processing when in the midst executing | |
1036 | of this method I<will not affect> the input stream used by the current | |
1037 | invocation of this method. | |
1038 | ||
1039 | This method does I<not> usually need to be overridden by subclasses. | |
1040 | ||
1041 | =cut | |
1042 | ||
1043 | sub parse_from_filehandle { | |
1044 | my $self = shift; | |
1045 | my %opts = (ref $_[0] eq 'HASH') ? %{ shift() } : (); | |
1046 | my ($in_fh, $out_fh) = @_; | |
22641bdf | 1047 | $in_fh = \*STDIN unless ($in_fh); |
a5317591 GS |
1048 | local *myData = $self; ## alias to avoid deref-ing overhead |
1049 | local *myOpts = ($myData{_PARSEOPTS} ||= {}); ## get parse-options | |
360aca43 GS |
1050 | local $_; |
1051 | ||
1052 | ## Put this stream at the top of the stack and do beginning-of-input | |
1053 | ## processing. NOTE that $in_fh might be reset during this process. | |
1054 | my $topstream = $self->_push_input_stream($in_fh, $out_fh); | |
1055 | (exists $opts{-cutting}) and $self->cutting( $opts{-cutting} ); | |
1056 | ||
1057 | ## Initialize line/paragraph | |
1058 | my ($textline, $paragraph) = ('', ''); | |
1059 | my ($nlines, $plines) = (0, 0); | |
1060 | ||
1061 | ## Use <$fh> instead of $fh->getline where possible (for speed) | |
1062 | $_ = ref $in_fh; | |
1063 | my $tied_fh = (/^(?:GLOB|FileHandle|IO::\w+)$/ or tied $in_fh); | |
1064 | ||
1065 | ## Read paragraphs line-by-line | |
1066 | while (defined ($textline = $tied_fh ? <$in_fh> : $in_fh->getline)) { | |
1067 | $textline = $self->preprocess_line($textline, ++$nlines); | |
1068 | next unless ((defined $textline) && (length $textline)); | |
360aca43 GS |
1069 | |
1070 | if ((! length $paragraph) && ($textline =~ /^==/)) { | |
1071 | ## '==' denotes a one-line command paragraph | |
1072 | $paragraph = $textline; | |
1073 | $plines = 1; | |
1074 | $textline = ''; | |
1075 | } else { | |
1076 | ## Append this line to the current paragraph | |
1077 | $paragraph .= $textline; | |
1078 | ++$plines; | |
1079 | } | |
1080 | ||
66aff6dd | 1081 | ## See if this line is blank and ends the current paragraph. |
360aca43 | 1082 | ## If it isnt, then keep iterating until it is. |
a5317591 GS |
1083 | next unless (($textline =~ /^([^\S\r\n]*)[\r\n]*$/) |
1084 | && (length $paragraph)); | |
66aff6dd GS |
1085 | |
1086 | ## Issue a warning about any non-empty blank lines | |
92e3d63a | 1087 | if (length($1) > 0 and $myOpts{'-warnings'} and ! $myData{_CUTTING}) { |
a5317591 GS |
1088 | my $errorsub = $self->errorsub(); |
1089 | my $file = $self->input_file(); | |
a5317591 GS |
1090 | my $errmsg = "*** WARNING: line containing nothing but whitespace". |
1091 | " in paragraph at line $nlines in file $file\n"; | |
1092 | (ref $errorsub) and &{$errorsub}($errmsg) | |
1093 | or (defined $errorsub) and $self->$errorsub($errmsg) | |
1094 | or warn($errmsg); | |
1095 | } | |
360aca43 GS |
1096 | |
1097 | ## Now process the paragraph | |
1098 | parse_paragraph($self, $paragraph, ($nlines - $plines) + 1); | |
1099 | $paragraph = ''; | |
1100 | $plines = 0; | |
1101 | } | |
1102 | ## Dont forget about the last paragraph in the file | |
1103 | if (length $paragraph) { | |
1104 | parse_paragraph($self, $paragraph, ($nlines - $plines) + 1) | |
1105 | } | |
1106 | ||
1107 | ## Now pop the input stream off the top of the input stack. | |
1108 | $self->_pop_input_stream(); | |
1109 | } | |
1110 | ||
1111 | ##--------------------------------------------------------------------------- | |
1112 | ||
1113 | =head1 B<parse_from_file()> | |
1114 | ||
1115 | $parser->parse_from_file($filename,$outfile); | |
1116 | ||
1117 | This method takes a filename and does the following: | |
1118 | ||
1119 | =over 2 | |
1120 | ||
1121 | =item * | |
1122 | ||
1123 | opens the input and output files for reading | |
1124 | (creating the appropriate filehandles) | |
1125 | ||
1126 | =item * | |
1127 | ||
1128 | invokes the B<parse_from_filehandle()> method passing it the | |
1129 | corresponding input and output filehandles. | |
1130 | ||
1131 | =item * | |
1132 | ||
1133 | closes the input and output files. | |
1134 | ||
1135 | =back | |
1136 | ||
1137 | If the special input filename "-" or "<&STDIN" is given then the STDIN | |
1138 | filehandle is used for input (and no open or close is performed). If no | |
1139 | input filename is specified then "-" is implied. | |
1140 | ||
1141 | If a second argument is given then it should be the name of the desired | |
1142 | output file. If the special output filename "-" or ">&STDOUT" is given | |
1143 | then the STDOUT filehandle is used for output (and no open or close is | |
1144 | performed). If the special output filename ">&STDERR" is given then the | |
1145 | STDERR filehandle is used for output (and no open or close is | |
1146 | performed). If no output filehandle is currently in use and no output | |
1147 | filename is specified, then "-" is implied. | |
d5c61f7c RGS |
1148 | Alternatively, an L<IO::String> object is also accepted as an output |
1149 | file handle. | |
360aca43 GS |
1150 | |
1151 | This method does I<not> usually need to be overridden by subclasses. | |
1152 | ||
1153 | =cut | |
1154 | ||
1155 | sub parse_from_file { | |
1156 | my $self = shift; | |
1157 | my %opts = (ref $_[0] eq 'HASH') ? %{ shift() } : (); | |
1158 | my ($infile, $outfile) = @_; | |
267d5541 SP |
1159 | my ($in_fh, $out_fh); |
1160 | if ($] < 5.006) { | |
1161 | ($in_fh, $out_fh) = (gensym(), gensym()); | |
1162 | } | |
360aca43 GS |
1163 | my ($close_input, $close_output) = (0, 0); |
1164 | local *myData = $self; | |
d5c61f7c | 1165 | local *_; |
360aca43 GS |
1166 | |
1167 | ## Is $infile a filename or a (possibly implied) filehandle | |
7b47f8ec | 1168 | if (defined $infile && ref $infile) { |
d5c61f7c RGS |
1169 | if (ref($infile) =~ /^(SCALAR|ARRAY|HASH|CODE|REF)$/) { |
1170 | croak "Input from $1 reference not supported!\n"; | |
1171 | } | |
360aca43 GS |
1172 | ## Must be a filehandle-ref (or else assume its a ref to an object |
1173 | ## that supports the common IO read operations). | |
1174 | $myData{_INFILE} = ${$infile}; | |
1175 | $in_fh = $infile; | |
1176 | } | |
7b47f8ec RGS |
1177 | elsif (!defined($infile) || !length($infile) || ($infile eq '-') |
1178 | || ($infile =~ /^<&(?:STDIN|0)$/i)) | |
1179 | { | |
1180 | ## Not a filename, just a string implying STDIN | |
1181 | $infile ||= '-'; | |
1182 | $myData{_INFILE} = "<standard input>"; | |
1183 | $in_fh = \*STDIN; | |
1184 | } | |
360aca43 GS |
1185 | else { |
1186 | ## We have a filename, open it for reading | |
1187 | $myData{_INFILE} = $infile; | |
475d79b5 | 1188 | open($in_fh, "< $infile") or |
360aca43 GS |
1189 | croak "Can't open $infile for reading: $!\n"; |
1190 | $close_input = 1; | |
1191 | } | |
1192 | ||
1193 | ## NOTE: we need to be *very* careful when "defaulting" the output | |
1194 | ## file. We only want to use a default if this is the beginning of | |
1195 | ## the entire document (but *not* if this is an included file). We | |
1196 | ## determine this by seeing if the input stream stack has been set-up | |
1197 | ## already | |
d5c61f7c RGS |
1198 | |
1199 | ## Is $outfile a filename, a (possibly implied) filehandle, maybe a ref? | |
7b47f8ec | 1200 | if (ref $outfile) { |
d5c61f7c RGS |
1201 | ## we need to check for ref() first, as other checks involve reading |
1202 | if (ref($outfile) =~ /^(ARRAY|HASH|CODE)$/) { | |
1203 | croak "Output to $1 reference not supported!\n"; | |
1204 | } | |
1205 | elsif (ref($outfile) eq 'SCALAR') { | |
1206 | # # NOTE: IO::String isn't a part of the perl distribution, | |
1207 | # # so probably we shouldn't support this case... | |
1208 | # require IO::String; | |
1209 | # $myData{_OUTFILE} = "$outfile"; | |
1210 | # $out_fh = IO::String->new($outfile); | |
1211 | croak "Output to SCALAR reference not supported!\n"; | |
360aca43 | 1212 | } |
d5c61f7c | 1213 | else { |
360aca43 GS |
1214 | ## Must be a filehandle-ref (or else assume its a ref to an |
1215 | ## object that supports the common IO write operations). | |
828c4421 | 1216 | $myData{_OUTFILE} = ${$outfile}; |
360aca43 GS |
1217 | $out_fh = $outfile; |
1218 | } | |
d5c61f7c | 1219 | } |
7b47f8ec RGS |
1220 | elsif (!defined($outfile) || !length($outfile) || ($outfile eq '-') |
1221 | || ($outfile =~ /^>&?(?:STDOUT|1)$/i)) | |
1222 | { | |
1223 | if (defined $myData{_TOP_STREAM}) { | |
1224 | $out_fh = $myData{_OUTPUT}; | |
1225 | } | |
1226 | else { | |
1227 | ## Not a filename, just a string implying STDOUT | |
1228 | $outfile ||= '-'; | |
1229 | $myData{_OUTFILE} = "<standard output>"; | |
1230 | $out_fh = \*STDOUT; | |
1231 | } | |
1232 | } | |
d5c61f7c RGS |
1233 | elsif ($outfile =~ /^>&(STDERR|2)$/i) { |
1234 | ## Not a filename, just a string implying STDERR | |
1235 | $myData{_OUTFILE} = "<standard error>"; | |
1236 | $out_fh = \*STDERR; | |
1237 | } | |
1238 | else { | |
1239 | ## We have a filename, open it for writing | |
1240 | $myData{_OUTFILE} = $outfile; | |
1241 | (-d $outfile) and croak "$outfile is a directory, not POD input!\n"; | |
1242 | open($out_fh, "> $outfile") or | |
1243 | croak "Can't open $outfile for writing: $!\n"; | |
1244 | $close_output = 1; | |
360aca43 GS |
1245 | } |
1246 | ||
1247 | ## Whew! That was a lot of work to set up reasonably/robust behavior | |
1248 | ## in the case of a non-filename for reading and writing. Now we just | |
1249 | ## have to parse the input and close the handles when we're finished. | |
1250 | $self->parse_from_filehandle(\%opts, $in_fh, $out_fh); | |
1251 | ||
1252 | $close_input and | |
1253 | close($in_fh) || croak "Can't close $infile after reading: $!\n"; | |
1254 | $close_output and | |
1255 | close($out_fh) || croak "Can't close $outfile after writing: $!\n"; | |
1256 | } | |
1257 | ||
1258 | ############################################################################# | |
1259 | ||
1260 | =head1 ACCESSOR METHODS | |
1261 | ||
1262 | Clients of B<Pod::Parser> should use the following methods to access | |
1263 | instance data fields: | |
1264 | ||
1265 | =cut | |
1266 | ||
1267 | ##--------------------------------------------------------------------------- | |
1268 | ||
664bb207 GS |
1269 | =head1 B<errorsub()> |
1270 | ||
1271 | $parser->errorsub("method_name"); | |
1272 | $parser->errorsub(\&warn_user); | |
1273 | $parser->errorsub(sub { print STDERR, @_ }); | |
1274 | ||
1275 | Specifies the method or subroutine to use when printing error messages | |
1276 | about POD syntax. The supplied method/subroutine I<must> return TRUE upon | |
1277 | successful printing of the message. If C<undef> is given, then the B<warn> | |
1278 | builtin is used to issue error messages (this is the default behavior). | |
1279 | ||
1280 | my $errorsub = $parser->errorsub() | |
1281 | my $errmsg = "This is an error message!\n" | |
1282 | (ref $errorsub) and &{$errorsub}($errmsg) | |
e3237417 | 1283 | or (defined $errorsub) and $parser->$errorsub($errmsg) |
664bb207 GS |
1284 | or warn($errmsg); |
1285 | ||
1286 | Returns a method name, or else a reference to the user-supplied subroutine | |
1287 | used to print error messages. Returns C<undef> if the B<warn> builtin | |
1288 | is used to issue error messages (this is the default behavior). | |
1289 | ||
1290 | =cut | |
1291 | ||
1292 | sub errorsub { | |
1293 | return (@_ > 1) ? ($_[0]->{_ERRORSUB} = $_[1]) : $_[0]->{_ERRORSUB}; | |
1294 | } | |
1295 | ||
1296 | ##--------------------------------------------------------------------------- | |
1297 | ||
360aca43 GS |
1298 | =head1 B<cutting()> |
1299 | ||
1300 | $boolean = $parser->cutting(); | |
1301 | ||
1302 | Returns the current C<cutting> state: a boolean-valued scalar which | |
1303 | evaluates to true if text from the input file is currently being "cut" | |
1304 | (meaning it is I<not> considered part of the POD document). | |
1305 | ||
1306 | $parser->cutting($boolean); | |
1307 | ||
1308 | Sets the current C<cutting> state to the given value and returns the | |
1309 | result. | |
1310 | ||
1311 | =cut | |
1312 | ||
1313 | sub cutting { | |
1314 | return (@_ > 1) ? ($_[0]->{_CUTTING} = $_[1]) : $_[0]->{_CUTTING}; | |
1315 | } | |
1316 | ||
1317 | ##--------------------------------------------------------------------------- | |
1318 | ||
664bb207 GS |
1319 | ##--------------------------------------------------------------------------- |
1320 | ||
1321 | =head1 B<parseopts()> | |
1322 | ||
1323 | When invoked with no additional arguments, B<parseopts> returns a hashtable | |
1324 | of all the current parsing options. | |
1325 | ||
1326 | ## See if we are parsing non-POD sections as well as POD ones | |
1327 | my %opts = $parser->parseopts(); | |
1328 | $opts{'-want_nonPODs}' and print "-want_nonPODs\n"; | |
1329 | ||
1330 | When invoked using a single string, B<parseopts> treats the string as the | |
1331 | name of a parse-option and returns its corresponding value if it exists | |
1332 | (returns C<undef> if it doesn't). | |
1333 | ||
1334 | ## Did we ask to see '=cut' paragraphs? | |
1335 | my $want_cut = $parser->parseopts('-process_cut_cmd'); | |
1336 | $want_cut and print "-process_cut_cmd\n"; | |
1337 | ||
1338 | When invoked with multiple arguments, B<parseopts> treats them as | |
1339 | key/value pairs and the specified parse-option names are set to the | |
1340 | given values. Any unspecified parse-options are unaffected. | |
1341 | ||
1342 | ## Set them back to the default | |
a5317591 | 1343 | $parser->parseopts(-warnings => 0); |
664bb207 GS |
1344 | |
1345 | When passed a single hash-ref, B<parseopts> uses that hash to completely | |
1346 | reset the existing parse-options, all previous parse-option values | |
1347 | are lost. | |
1348 | ||
1349 | ## Reset all options to default | |
1350 | $parser->parseopts( { } ); | |
1351 | ||
a5317591 | 1352 | See L<"PARSING OPTIONS"> for more information on the name and meaning of each |
664bb207 GS |
1353 | parse-option currently recognized. |
1354 | ||
1355 | =cut | |
1356 | ||
1357 | sub parseopts { | |
1358 | local *myData = shift; | |
1359 | local *myOpts = ($myData{_PARSEOPTS} ||= {}); | |
1360 | return %myOpts if (@_ == 0); | |
1361 | if (@_ == 1) { | |
1362 | local $_ = shift; | |
1363 | return ref($_) ? $myData{_PARSEOPTS} = $_ : $myOpts{$_}; | |
1364 | } | |
1365 | my @newOpts = (%myOpts, @_); | |
1366 | $myData{_PARSEOPTS} = { @newOpts }; | |
1367 | } | |
1368 | ||
1369 | ##--------------------------------------------------------------------------- | |
1370 | ||
360aca43 GS |
1371 | =head1 B<output_file()> |
1372 | ||
1373 | $fname = $parser->output_file(); | |
1374 | ||
1375 | Returns the name of the output file being written. | |
1376 | ||
1377 | =cut | |
1378 | ||
1379 | sub output_file { | |
1380 | return $_[0]->{_OUTFILE}; | |
1381 | } | |
1382 | ||
1383 | ##--------------------------------------------------------------------------- | |
1384 | ||
1385 | =head1 B<output_handle()> | |
1386 | ||
1387 | $fhandle = $parser->output_handle(); | |
1388 | ||
1389 | Returns the output filehandle object. | |
1390 | ||
1391 | =cut | |
1392 | ||
1393 | sub output_handle { | |
1394 | return $_[0]->{_OUTPUT}; | |
1395 | } | |
1396 | ||
1397 | ##--------------------------------------------------------------------------- | |
1398 | ||
1399 | =head1 B<input_file()> | |
1400 | ||
1401 | $fname = $parser->input_file(); | |
1402 | ||
1403 | Returns the name of the input file being read. | |
1404 | ||
1405 | =cut | |
1406 | ||
1407 | sub input_file { | |
1408 | return $_[0]->{_INFILE}; | |
1409 | } | |
1410 | ||
1411 | ##--------------------------------------------------------------------------- | |
1412 | ||
1413 | =head1 B<input_handle()> | |
1414 | ||
1415 | $fhandle = $parser->input_handle(); | |
1416 | ||
1417 | Returns the current input filehandle object. | |
1418 | ||
1419 | =cut | |
1420 | ||
1421 | sub input_handle { | |
1422 | return $_[0]->{_INPUT}; | |
1423 | } | |
1424 | ||
1425 | ##--------------------------------------------------------------------------- | |
1426 | ||
1427 | =begin __PRIVATE__ | |
1428 | ||
1429 | =head1 B<input_streams()> | |
1430 | ||
1431 | $listref = $parser->input_streams(); | |
1432 | ||
1433 | Returns a reference to an array which corresponds to the stack of all | |
1434 | the input streams that are currently in the middle of being parsed. | |
1435 | ||
1436 | While parsing an input stream, it is possible to invoke | |
1437 | B<parse_from_file()> or B<parse_from_filehandle()> to parse a new input | |
1438 | stream and then return to parsing the previous input stream. Each input | |
1439 | stream to be parsed is pushed onto the end of this input stack | |
1440 | before any of its input is read. The input stream that is currently | |
1441 | being parsed is always at the end (or top) of the input stack. When an | |
1442 | input stream has been exhausted, it is popped off the end of the | |
1443 | input stack. | |
1444 | ||
1445 | Each element on this input stack is a reference to C<Pod::InputSource> | |
1446 | object. Please see L<Pod::InputObjects> for more details. | |
1447 | ||
1448 | This method might be invoked when printing diagnostic messages, for example, | |
1449 | to obtain the name and line number of the all input files that are currently | |
1450 | being processed. | |
1451 | ||
1452 | =end __PRIVATE__ | |
1453 | ||
1454 | =cut | |
1455 | ||
1456 | sub input_streams { | |
1457 | return $_[0]->{_INPUT_STREAMS}; | |
1458 | } | |
1459 | ||
1460 | ##--------------------------------------------------------------------------- | |
1461 | ||
1462 | =begin __PRIVATE__ | |
1463 | ||
1464 | =head1 B<top_stream()> | |
1465 | ||
1466 | $hashref = $parser->top_stream(); | |
1467 | ||
1468 | Returns a reference to the hash-table that represents the element | |
1469 | that is currently at the top (end) of the input stream stack | |
1470 | (see L<"input_streams()">). The return value will be the C<undef> | |
1471 | if the input stack is empty. | |
1472 | ||
1473 | This method might be used when printing diagnostic messages, for example, | |
1474 | to obtain the name and line number of the current input file. | |
1475 | ||
1476 | =end __PRIVATE__ | |
1477 | ||
1478 | =cut | |
1479 | ||
1480 | sub top_stream { | |
1481 | return $_[0]->{_TOP_STREAM} || undef; | |
1482 | } | |
1483 | ||
1484 | ############################################################################# | |
1485 | ||
1486 | =head1 PRIVATE METHODS AND DATA | |
1487 | ||
1488 | B<Pod::Parser> makes use of several internal methods and data fields | |
1489 | which clients should not need to see or use. For the sake of avoiding | |
1490 | name collisions for client data and methods, these methods and fields | |
1491 | are briefly discussed here. Determined hackers may obtain further | |
1492 | information about them by reading the B<Pod::Parser> source code. | |
1493 | ||
1494 | Private data fields are stored in the hash-object whose reference is | |
1495 | returned by the B<new()> constructor for this class. The names of all | |
1496 | private methods and data-fields used by B<Pod::Parser> begin with a | |
1497 | prefix of "_" and match the regular expression C</^_\w+$/>. | |
1498 | ||
1499 | =cut | |
1500 | ||
1501 | ##--------------------------------------------------------------------------- | |
1502 | ||
1503 | =begin _PRIVATE_ | |
1504 | ||
1505 | =head1 B<_push_input_stream()> | |
1506 | ||
1507 | $hashref = $parser->_push_input_stream($in_fh,$out_fh); | |
1508 | ||
1509 | This method will push the given input stream on the input stack and | |
1510 | perform any necessary beginning-of-document or beginning-of-file | |
1511 | processing. The argument C<$in_fh> is the input stream filehandle to | |
1512 | push, and C<$out_fh> is the corresponding output filehandle to use (if | |
1513 | it is not given or is undefined, then the current output stream is used, | |
1514 | which defaults to standard output if it doesnt exist yet). | |
1515 | ||
1516 | The value returned will be reference to the hash-table that represents | |
1517 | the new top of the input stream stack. I<Please Note> that it is | |
1518 | possible for this method to use default values for the input and output | |
1519 | file handles. If this happens, you will need to look at the C<INPUT> | |
1520 | and C<OUTPUT> instance data members to determine their new values. | |
1521 | ||
1522 | =end _PRIVATE_ | |
1523 | ||
1524 | =cut | |
1525 | ||
1526 | sub _push_input_stream { | |
1527 | my ($self, $in_fh, $out_fh) = @_; | |
1528 | local *myData = $self; | |
1529 | ||
1530 | ## Initialize stuff for the entire document if this is *not* | |
1531 | ## an included file. | |
1532 | ## | |
1533 | ## NOTE: we need to be *very* careful when "defaulting" the output | |
1534 | ## filehandle. We only want to use a default value if this is the | |
1535 | ## beginning of the entire document (but *not* if this is an included | |
1536 | ## file). | |
1537 | unless (defined $myData{_TOP_STREAM}) { | |
1538 | $out_fh = \*STDOUT unless (defined $out_fh); | |
1539 | $myData{_CUTTING} = 1; ## current "cutting" state | |
1540 | $myData{_INPUT_STREAMS} = []; ## stack of all input streams | |
1541 | } | |
1542 | ||
1543 | ## Initialize input indicators | |
1544 | $myData{_OUTFILE} = '(unknown)' unless (defined $myData{_OUTFILE}); | |
1545 | $myData{_OUTPUT} = $out_fh if (defined $out_fh); | |
1546 | $in_fh = \*STDIN unless (defined $in_fh); | |
1547 | $myData{_INFILE} = '(unknown)' unless (defined $myData{_INFILE}); | |
1548 | $myData{_INPUT} = $in_fh; | |
1549 | my $input_top = $myData{_TOP_STREAM} | |
1550 | = new Pod::InputSource( | |
1551 | -name => $myData{_INFILE}, | |
1552 | -handle => $in_fh, | |
1553 | -was_cutting => $myData{_CUTTING} | |
1554 | ); | |
1555 | local *input_stack = $myData{_INPUT_STREAMS}; | |
1556 | push(@input_stack, $input_top); | |
1557 | ||
1558 | ## Perform beginning-of-document and/or beginning-of-input processing | |
1559 | $self->begin_pod() if (@input_stack == 1); | |
1560 | $self->begin_input(); | |
1561 | ||
1562 | return $input_top; | |
1563 | } | |
1564 | ||
1565 | ##--------------------------------------------------------------------------- | |
1566 | ||
1567 | =begin _PRIVATE_ | |
1568 | ||
1569 | =head1 B<_pop_input_stream()> | |
1570 | ||
1571 | $hashref = $parser->_pop_input_stream(); | |
1572 | ||
1573 | This takes no arguments. It will perform any necessary end-of-file or | |
1574 | end-of-document processing and then pop the current input stream from | |
1575 | the top of the input stack. | |
1576 | ||
1577 | The value returned will be reference to the hash-table that represents | |
1578 | the new top of the input stream stack. | |
1579 | ||
1580 | =end _PRIVATE_ | |
1581 | ||
1582 | =cut | |
1583 | ||
1584 | sub _pop_input_stream { | |
1585 | my ($self) = @_; | |
1586 | local *myData = $self; | |
1587 | local *input_stack = $myData{_INPUT_STREAMS}; | |
1588 | ||
1589 | ## Perform end-of-input and/or end-of-document processing | |
1590 | $self->end_input() if (@input_stack > 0); | |
1591 | $self->end_pod() if (@input_stack == 1); | |
1592 | ||
1593 | ## Restore cutting state to whatever it was before we started | |
1594 | ## parsing this file. | |
1595 | my $old_top = pop(@input_stack); | |
1596 | $myData{_CUTTING} = $old_top->was_cutting(); | |
1597 | ||
1598 | ## Dont forget to reset the input indicators | |
1599 | my $input_top = undef; | |
1600 | if (@input_stack > 0) { | |
1601 | $input_top = $myData{_TOP_STREAM} = $input_stack[-1]; | |
1602 | $myData{_INFILE} = $input_top->name(); | |
1603 | $myData{_INPUT} = $input_top->handle(); | |
1604 | } else { | |
1605 | delete $myData{_TOP_STREAM}; | |
1606 | delete $myData{_INPUT_STREAMS}; | |
1607 | } | |
1608 | ||
1609 | return $input_top; | |
1610 | } | |
1611 | ||
1612 | ############################################################################# | |
1613 | ||
664bb207 GS |
1614 | =head1 TREE-BASED PARSING |
1615 | ||
1616 | If straightforward stream-based parsing wont meet your needs (as is | |
1617 | likely the case for tasks such as translating PODs into structured | |
1618 | markup languages like HTML and XML) then you may need to take the | |
1619 | tree-based approach. Rather than doing everything in one pass and | |
1620 | calling the B<interpolate()> method to expand sequences into text, it | |
1621 | may be desirable to instead create a parse-tree using the B<parse_text()> | |
d1be9408 | 1622 | method to return a tree-like structure which may contain an ordered |
664bb207 GS |
1623 | list of children (each of which may be a text-string, or a similar |
1624 | tree-like structure). | |
1625 | ||
1626 | Pay special attention to L<"METHODS FOR PARSING AND PROCESSING"> and | |
1627 | to the objects described in L<Pod::InputObjects>. The former describes | |
1628 | the gory details and parameters for how to customize and extend the | |
1629 | parsing behavior of B<Pod::Parser>. B<Pod::InputObjects> provides | |
1630 | several objects that may all be used interchangeably as parse-trees. The | |
1631 | most obvious one is the B<Pod::ParseTree> object. It defines the basic | |
1632 | interface and functionality that all things trying to be a POD parse-tree | |
1633 | should do. A B<Pod::ParseTree> is defined such that each "node" may be a | |
1634 | text-string, or a reference to another parse-tree. Each B<Pod::Paragraph> | |
1635 | object and each B<Pod::InteriorSequence> object also supports the basic | |
1636 | parse-tree interface. | |
1637 | ||
1638 | The B<parse_text()> method takes a given paragraph of text, and | |
1639 | returns a parse-tree that contains one or more children, each of which | |
1640 | may be a text-string, or an InteriorSequence object. There are also | |
1641 | callback-options that may be passed to B<parse_text()> to customize | |
1642 | the way it expands or transforms interior-sequences, as well as the | |
1643 | returned result. These callbacks can be used to create a parse-tree | |
1644 | with custom-made objects (which may or may not support the parse-tree | |
1645 | interface, depending on how you choose to do it). | |
1646 | ||
1647 | If you wish to turn an entire POD document into a parse-tree, that process | |
1648 | is fairly straightforward. The B<parse_text()> method is the key to doing | |
1649 | this successfully. Every paragraph-callback (i.e. the polymorphic methods | |
1650 | for B<command()>, B<verbatim()>, and B<textblock()> paragraphs) takes | |
1651 | a B<Pod::Paragraph> object as an argument. Each paragraph object has a | |
1652 | B<parse_tree()> method that can be used to get or set a corresponding | |
1653 | parse-tree. So for each of those paragraph-callback methods, simply call | |
1654 | B<parse_text()> with the options you desire, and then use the returned | |
1655 | parse-tree to assign to the given paragraph object. | |
1656 | ||
1657 | That gives you a parse-tree for each paragraph - so now all you need is | |
1658 | an ordered list of paragraphs. You can maintain that yourself as a data | |
1659 | element in the object/hash. The most straightforward way would be simply | |
1660 | to use an array-ref, with the desired set of custom "options" for each | |
1661 | invocation of B<parse_text>. Let's assume the desired option-set is | |
1662 | given by the hash C<%options>. Then we might do something like the | |
1663 | following: | |
1664 | ||
1665 | package MyPodParserTree; | |
1666 | ||
1667 | @ISA = qw( Pod::Parser ); | |
1668 | ||
1669 | ... | |
1670 | ||
1671 | sub begin_pod { | |
1672 | my $self = shift; | |
1673 | $self->{'-paragraphs'} = []; ## initialize paragraph list | |
1674 | } | |
1675 | ||
1676 | sub command { | |
1677 | my ($parser, $command, $paragraph, $line_num, $pod_para) = @_; | |
1678 | my $ptree = $parser->parse_text({%options}, $paragraph, ...); | |
1679 | $pod_para->parse_tree( $ptree ); | |
1680 | push @{ $self->{'-paragraphs'} }, $pod_para; | |
1681 | } | |
1682 | ||
1683 | sub verbatim { | |
1684 | my ($parser, $paragraph, $line_num, $pod_para) = @_; | |
1685 | push @{ $self->{'-paragraphs'} }, $pod_para; | |
1686 | } | |
1687 | ||
1688 | sub textblock { | |
1689 | my ($parser, $paragraph, $line_num, $pod_para) = @_; | |
1690 | my $ptree = $parser->parse_text({%options}, $paragraph, ...); | |
1691 | $pod_para->parse_tree( $ptree ); | |
1692 | push @{ $self->{'-paragraphs'} }, $pod_para; | |
1693 | } | |
1694 | ||
1695 | ... | |
1696 | ||
1697 | package main; | |
1698 | ... | |
1699 | my $parser = new MyPodParserTree(...); | |
1700 | $parser->parse_from_file(...); | |
1701 | my $paragraphs_ref = $parser->{'-paragraphs'}; | |
1702 | ||
1703 | Of course, in this module-author's humble opinion, I'd be more inclined to | |
1704 | use the existing B<Pod::ParseTree> object than a simple array. That way | |
1705 | everything in it, paragraphs and sequences, all respond to the same core | |
1706 | interface for all parse-tree nodes. The result would look something like: | |
1707 | ||
1708 | package MyPodParserTree2; | |
1709 | ||
1710 | ... | |
1711 | ||
1712 | sub begin_pod { | |
1713 | my $self = shift; | |
1714 | $self->{'-ptree'} = new Pod::ParseTree; ## initialize parse-tree | |
1715 | } | |
1716 | ||
1717 | sub parse_tree { | |
1718 | ## convenience method to get/set the parse-tree for the entire POD | |
1719 | (@_ > 1) and $_[0]->{'-ptree'} = $_[1]; | |
1720 | return $_[0]->{'-ptree'}; | |
1721 | } | |
1722 | ||
1723 | sub command { | |
1724 | my ($parser, $command, $paragraph, $line_num, $pod_para) = @_; | |
1725 | my $ptree = $parser->parse_text({<<options>>}, $paragraph, ...); | |
1726 | $pod_para->parse_tree( $ptree ); | |
1727 | $parser->parse_tree()->append( $pod_para ); | |
1728 | } | |
1729 | ||
1730 | sub verbatim { | |
1731 | my ($parser, $paragraph, $line_num, $pod_para) = @_; | |
1732 | $parser->parse_tree()->append( $pod_para ); | |
1733 | } | |
1734 | ||
1735 | sub textblock { | |
1736 | my ($parser, $paragraph, $line_num, $pod_para) = @_; | |
1737 | my $ptree = $parser->parse_text({<<options>>}, $paragraph, ...); | |
1738 | $pod_para->parse_tree( $ptree ); | |
1739 | $parser->parse_tree()->append( $pod_para ); | |
1740 | } | |
1741 | ||
1742 | ... | |
1743 | ||
1744 | package main; | |
1745 | ... | |
1746 | my $parser = new MyPodParserTree2(...); | |
1747 | $parser->parse_from_file(...); | |
1748 | my $ptree = $parser->parse_tree; | |
1749 | ... | |
1750 | ||
1751 | Now you have the entire POD document as one great big parse-tree. You | |
1752 | can even use the B<-expand_seq> option to B<parse_text> to insert | |
1753 | whole different kinds of objects. Just don't expect B<Pod::Parser> | |
1754 | to know what to do with them after that. That will need to be in your | |
1755 | code. Or, alternatively, you can insert any object you like so long as | |
1756 | it conforms to the B<Pod::ParseTree> interface. | |
1757 | ||
1758 | One could use this to create subclasses of B<Pod::Paragraphs> and | |
1759 | B<Pod::InteriorSequences> for specific commands (or to create your own | |
1760 | custom node-types in the parse-tree) and add some kind of B<emit()> | |
1761 | method to each custom node/subclass object in the tree. Then all you'd | |
1762 | need to do is recursively walk the tree in the desired order, processing | |
1763 | the children (most likely from left to right) by formatting them if | |
1764 | they are text-strings, or by calling their B<emit()> method if they | |
1765 | are objects/references. | |
1766 | ||
267d5541 SP |
1767 | =head1 CAVEATS |
1768 | ||
1769 | Please note that POD has the notion of "paragraphs": this is something | |
1770 | starting I<after> a blank (read: empty) line, with the single exception | |
1771 | of the file start, which is also starting a paragraph. That means that | |
1772 | especially a command (e.g. C<=head1>) I<must> be preceded with a blank | |
1773 | line; C<__END__> is I<not> a blank line. | |
1774 | ||
360aca43 GS |
1775 | =head1 SEE ALSO |
1776 | ||
1777 | L<Pod::InputObjects>, L<Pod::Select> | |
1778 | ||
1779 | B<Pod::InputObjects> defines POD input objects corresponding to | |
1780 | command paragraphs, parse-trees, and interior-sequences. | |
1781 | ||
1782 | B<Pod::Select> is a subclass of B<Pod::Parser> which provides the ability | |
1783 | to selectively include and/or exclude sections of a POD document from being | |
1784 | translated based upon the current heading, subheading, subsubheading, etc. | |
1785 | ||
1786 | =for __PRIVATE__ | |
1787 | B<Pod::Callbacks> is a subclass of B<Pod::Parser> which gives its users | |
1788 | the ability the employ I<callback functions> instead of, or in addition | |
1789 | to, overriding methods of the base class. | |
1790 | ||
1791 | =for __PRIVATE__ | |
1792 | B<Pod::Select> and B<Pod::Callbacks> do not override any | |
1793 | methods nor do they define any new methods with the same name. Because | |
1794 | of this, they may I<both> be used (in combination) as a base class of | |
1795 | the same subclass in order to combine their functionality without | |
1796 | causing any namespace clashes due to multiple inheritance. | |
1797 | ||
1798 | =head1 AUTHOR | |
1799 | ||
aaa799f9 NC |
1800 | Please report bugs using L<http://rt.cpan.org>. |
1801 | ||
360aca43 GS |
1802 | Brad Appleton E<lt>bradapp@enteract.comE<gt> |
1803 | ||
1804 | Based on code for B<Pod::Text> written by | |
1805 | Tom Christiansen E<lt>tchrist@mox.perl.comE<gt> | |
1806 | ||
1807 | =cut | |
1808 | ||
1809 | 1; | |
d5c61f7c | 1810 | # vim: ts=4 sw=4 et |