Edinburgh Speech Tools  2.1-release
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
XML support

There are three levels of support for XML with EST.

Reading XML Text As An EST_Utterance

In order to read XML marked up text, the EST code must be told how the XML markup should relate to the utterance structure. This is done by annotating the DTD using which the text is processed.

There are two possible ways to anotate the DTD. Either a new DTD can be created with the anotations added, or the anotations can be included in the XML file.

A new DTD

To write a new DTD based on an existing one, you should include the existing one as follows:

<!-- Extended FooBar DTD for speech tools -->
<!-- Include original FooBar DTD -->
<!ENTITY % OldFooBarDTD PUBLIC "//Foo//DTD Bar"
"http://www.foo.org/dtds/org.dtd">
%OldFooBarDTD;
<!-- Your extensions, for instance... -->
<!-- syn-node elements are nodes in the Syntax relation -->
<!ATTLIST syn-node relationNode CDATA #FIXED "Syntax" >

In the XML file

Extensions to the DTD can be included in the !DOCTYPE declaration in the marked up text. For instance:

<?xml version='1.0'?>
<!DOCTYPE utterance PUBLIC "//Foo//DTD Bar"
"http://www.foo.org/dtds/org.dtd"
[
<!-- Item elements are nodes in the Syntax relation -->
<!ATTLIST item relationNode CDATA #FIXED "Syntax" >
]>
<utterance>
<!-- Actual markup starts here -->

Summary of DTD Anotations

The following attributes may be added to elements in your DTD to describe it's relation to EST_Utterance structures.

The XML_Parser_Class C++ Class

The C++ class XML_Parser_Class (declared in rxp/XML_Parser.h) defines an abstract interface to the XML parsing process. By breating a sub-class of XML_Parser_Class you can create code to read XML marked up text quite simply.

Some Definitions

Creating An XML Processing Procedure

In order to create a procedure which will process XML marked up text in the manner of your choice you need to do 4 things. Simple examples can be found in testsuite/xml_example.cc and main/xml_parser_main.cc.

Create a Sub-Class of XML_Parser_Class

Not written

Create a Structure Holding the State of the Parse

Not written

Decide How Entity IDs Should Be Converted To Filenames

Not written

Write A Procedure To Start The Parser

Not written

The XML_Parser_Class in Detail

Not written

The RXP XML Parser

Included in the EST library is a version of the RXP XML parser. This version is limited to 8-bit characters for consistency with the rest of EST. For more details, see the RXP documentation.

Insert reference to RXP documentation here.