SGML::Parser - SGML instance parser
package MyParser; use SGML::Parser; @ISA = qw( SGML::Parser ); sub cdata { ... } sub char_ref { ... } sub comment_decl { ... } sub end_tag { ... } sub entity_ref { ... } sub ignored_data { ... } sub marked_sect_close { ... } sub marked_sect_open { ... } sub parm_entity_ref { ... } sub processing_inst { ... } sub start_tag { ... } sub error { ... } $myparser = new MyParser; $myparser->parse_data(\*FILEHANDLE);
SGML::Parser is a simple SGML instance parser; it cannot parse document type declarations. To use the class, you create a derived class of SGML::Parser and redefine the various methods invoked when certain events occur during parsing.
The following class methods are defined:
Instantiate a new parser object. The way SGML::Parser is defined, an SGML::Parser object will probably never be directed instantiated, but a derived class will be. The new method is implemented to be reused by derived classes, so redefinition of the method is not required (unless derived class must perform custom initialization beyond what SGML::Parser performs).
The following lists the methods defined by SGML::Parser that should not be overriden:
$parser->parse_data( $fh, $label, $init_buf, $line_no);
The return value of the method should be undef. However, if any data was in the current buffer and parsing was aborted, the return value is the buffer's contents.
parse_data parses an SGML instance. When parsing, the various callback methods are called when the various lexical contructs are encountered.
$parser->get_line_no();
N/A
The current line number.
Get the current line number. Method useful in callback methods.
$parser->get_input_label()
N/A
Label string.
Retrieves the label given to the input being parsed. Label is defined when the parse_data method is called. Method useful in callback methods.
The following methods are intended to be redefined by a derived class to handle the processing events generated by the parse_data method.
$parser->cdata($data);
N/A
cdata is invoked when character data is encountered. The character data is passed into the method. Multiple lines of character data may generate multiple cdata calls.
$parser->char_ref($value);
N/A
char_ref is invoked when a character reference is encountered. The number/name of the character reference is passed in as an argument.
$parser->comment_decl(\@comments);
N/A
comment_decl is called when a comment declaration is parsed. The passed in argument is a reference to an array containing the comment blocks defined in the declaration.
$parser->end_tag($gi);
N/A
end_tag is called when an end tag is encountered. The generic identifier of the end tag is passed in as an argument. The value may be the empty string if the end tag is a null end tag.
$parser->entity_ref($name);
Any text that should be furthered parse.
entity_ref is called for entity references. The name of the entity is passed in as an argument. If any data is returned by this method, the data will be prepended to the parse buffer and parsed.
$parser->error(@msg);
N/A
error is called when any error occurs in parsing. The default implementation is to print the error message (which can be a list of strings) prepending by the class name, input label, and line number the method was called.
$parser->ignored_data($data);
N/A
ignored_data is called for data that is in an IGNORE marked section.
$parser->marked_sect_close();
N/A
marked_sect_close is called when a marked section close is encountered.
$parser->marked_sect_open( $status_keyword, $status_spec);
N/A
marked_sect_open is called when a marked section open is encountered. The $status_keyword argument is the status keyword for the marked section (eg. INCLUDE, IGNORE). The $status_spec argument is the original status specification text. This may be equal to $status_keyword, or contain an parameter entity reference. If a parameter entity reference, the parm_entity_ref method was called to determine the value of the $status_keyword argument.
$parser->parm_entity_ref($name);
Replacement text.
parm_entity_ref is called to resolve parameter entity references. Currently, it is only invoked if a parameter entity reference is encountered in a marked section open. The return value should contain the value of the parameter entity reference.
$parser->processing_inst($data);
N/A
processing_inst is called for processing instructions. $data is the content of the processing instruction.
$parser->start_tag($gi, $attr_spec);
N/A
start_tag is called for start tags. $gi is the generic indentifier of the start tag. $attr_spec is the attribute specification list string. The SGMLparse_attr_spec function defined in SGML::Util can be used to parse the string into name/value pairs.
SGML::Parser has parser modes for properly determining how to analyze the input data. Mode switching is automatic for most cases. However, since SGML parsing rules can changed depending on the content model of elements, callback methods can force a mode change. This mode change will normally be done when encountering a start tag (which invokes the start_tag method) and the element represented by the start tag should parsed like it has CDATA or RCDATA content. The following code example shows how you can change parsing modes:
sub start_tag { my $this = shift; my $gi = uc shift; my $attr_spec = shift; if ($gi eq 'LITERAL-TEXT') { $this->{'mode'} = $SGML::Parser::ModeCData; } elsif ($gi eq 'EX') { $this->{'mode'} = $SGML::Parser::ModeRCData; } # ... }
The element names are arbitrary, but it shows how you can switch parsing modes via a callback method. SGML::Parser will change the mode when an end tag is encountered.
perl(1)
This software is part of the perlSGML package; see (http://www.oac.uci.edu/indiv/ehood/perlSGML.html)