Edinburgh Speech Tools  2.1-release
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
doc/estxml.md
1 XML support {#estxml}
2 ================
3 
4 There are three levels of support for XML with EST.
5 
6  - *Loading as an Utterance*: A built in XML parser allows text marked up
7  according to an XML DTD to be read into an EST_Utterance (see \ref xmltoutterance).
8 
9  - *XML_Parser_Class*: A C++ class XML_Parser_Class which makes it
10  relatively simple to write specialised XML processing code.
11  - *RXP*: The RXP XML parser is included and can be used
12  directly (\ref rxpparser).
13 
14 # Reading XML Text As An EST_Utterance {#xmltoutterance}
15 
16 In order to read XML marked up text, the EST code must be
17 told how the XML markup should relate to the utterance
18 structure. This is done by annotating the DTD using which the
19 text is processed.
20 
21 There are two possible ways to anotate the DTD. Either a new
22 DTD can be created with the anotations added, or the
23 anotations can be included in the XML file.
24 
25 **A new DTD**
26 
27 To write a new DTD based on an existing one, you should include
28 the existing one as follows:
29 @code{.xml}
30  <!-- Extended FooBar DTD for speech tools -->
31 
32  <!-- Include original FooBar DTD -->
33  <!ENTITY % OldFooBarDTD PUBLIC "//Foo//DTD Bar"
34  "http://www.foo.org/dtds/org.dtd">
35  %OldFooBarDTD;
36 
37  <!-- Your extensions, for instance... -->
38 
39  <!-- syn-node elements are nodes in the Syntax relation -->
40  <!ATTLIST syn-node relationNode CDATA #FIXED "Syntax" >
41 @endcode
42 
43 **In the XML file**
44 
45 Extensions to the DTD can be included in the
46 `!DOCTYPE` declaration in the marked up
47 text. For instance:
48 @code{.xml}
49  <?xml version='1.0'?>
50  <!DOCTYPE utterance PUBLIC "//Foo//DTD Bar"
51  "http://www.foo.org/dtds/org.dtd"
52 [
53 <!-- Item elements are nodes in the Syntax relation -->
54 <!ATTLIST item relationNode CDATA #FIXED "Syntax" >
55 ]>
56 
57  <utterance>
58  <!-- Actual markup starts here -->
59 @endcode
60 
61 ## Summary of DTD Anotations
62 
63 The following attributes may be added to elements in your
64 DTD to describe it's relation to EST_Utterance structures.
65 
66  - *estUttFeats*: The value should be a comma separated list of
67  attributes which should be set as features on the
68  utterance. Each attribute can be either a simple
69  identifier, or two identifiers separated by a colon `:`.
70 
71  A value `foo:bar` causes the value of
72  the `foo` attribute of the element to be
73  set as the value of the Utterance feature `bar`.
74 
75  A simple identifier `foo` causes the
76  `foo` attribute of the element to be
77  set as the value of the Utterance feature
78  `X_foo` where `X` is the
79  name of the element.
80 
81  - *estRelationFeat*: The value should be a comma separated list of
82  attributes which should be set as features on the
83  relation related to this element. It's format and
84  meaning is the same as for `estUttFeats`.
85 
86  - *estRelationElementAttr*: Indicates that this element defines a relation. All
87  elements inside this one will be made nodes in the
88  relation, unless they are explicitly marked to be
89  ignored by *estRelationIgnore*. The
90  value of the *estRelationElementAttr*
91  attribute is the name of an attribute which gives the
92  name of the relation.
93 
94  - *estRelationTypeAttr*: When an element has a
95  *estRelationElementAttr* tag to indicate it's
96  content defines a relaion, it may also have the
97  *estRelationTypeAttr* tag. This gives
98  the name of an attribute which gives the type of
99  relation. Currently only a type of `list' or `linear'
100  gives a lienar relation, anything else gives a tree.
101 
102  - *estRelationIgnore*: If this is set to any value on an element which would
103  otherwise be interpreted as an EST_Item in the current
104  relation, the element is passed over. The contents
105  will be processed as if they had been directly inside
106  this element's parent.
107 
108  - *estRelationNode*: When placed on an element, indicates that this element
109  is to be interpreted as an item in the relation named
110  in the value of the attribute.
111 
112  - *estExpansion*: The value of this attribute defines how ranges in
113  *href* attributes are expanded for
114  this element. If the value is `replace`
115  the nodes created during expansion are placed at the
116  same level in the hierachy as the original element. If
117  the value is `embed` they are created as
118  children of a new node.
119 
120  - *estContentFeature*: The value of this attribute is the featre which is set
121  to the contents of the current element.
122 
123 # The XML_Parser_Class C++ Class {#xmlparserclass}
124 
125 The C++ class XML_Parser_Class
126 (declared in \ref rxp/XML_Parser.h) defines an
127 abstract interface to the XML parsing process. By
128 breating a sub-class of XML_Parser_Class you can create code to
129 read XML marked up text quite simply.
130 
131 ## Some Definitions {#xmlparserclassdefinitions}
132 
133  - An XML parser is an object which can
134  analyse a piece of text marked up according to an XML
135  doctype and perform actions based on the markup. One
136  XML parser deals with one text.
137 
138  - An XML parser is represented by an instance of the
139  class XML_Parser.
140 
141  - An XML parser class is an object from which
142  XML parses can be created. It defines the behaviour of
143  the parsers when they process their assigned text, and
144  also a mapping from XML entity IDs to places to look
145  for them.
146 
147  - An XML parser class is represented by an instance of
148  XML_Parser_Class or a subclass of XML_Parser_Class.
149 
150 ## Creating An XML Processing Procedure {#xmlcreatingxmlproc}
151 
152 In order to create a procedure which will process XML
153 marked up text in the manner of your choice you need to do 4
154 things. Simple examples can be found in \ref testsuite/xml_example.cc and
155 \ref main/xml_parser_main.cc.
156 
157 
158 ### Create a Sub-Class of XML_Parser_Class
159 
160 Not written
161 
162 ### Create a Structure Holding the State of the Parse
163 
164 Not written
165 
166 ### Decide How Entity IDs Should Be Converted To Filenames
167 
168 Not written
169 
170 ### Write A Procedure To Start The Parser
171 
172 Not written
173 
174 ## The XML_Parser_Class in Detail
175 
176 Not written
177 
178  - XMLParser
179 
180 # The RXP XML Parser {#rxpparser}
181 
182 Included in the EST library is a version of the *RXP XML parser*.
183 This version is limited to 8-bit characters for consistency with the rest of
184 EST. For more details, see the *RXP* documentation.
185 
186 Insert reference to *RXP* documentation here.
187 
188 
Definition: dtd.h:71
virtual void element(XML_Parser_Class &c, XML_Parser &p, void *data, const char *name, XML_Attribute_List &attributes)
Definition: XML_Parser.cc:186
virtual void processing(XML_Parser_Class &c, XML_Parser &p, void *data, const char *instruction)
Definition: XML_Parser.cc:214
EST_Item * parent(const EST_Item *n)
return parent of n