Up to now we used for XML elements (and their types) an abbreviated notation
as for <table align="center" valign="top">[some_content]. Actually, the precise syntax of XML elements is
where expr1, expr2, andexpr3 are generic
expressions. The same holds true for record patterns, but where
the generic expressions are replaced by generic patterns (that is, <(p1) (p2)>p3). It is important
to notice that the parentheses (in red) <(expr1)
(expr2)>expr3 are part of the syntax.
Even if expr1, expr2, and
expr3 may be any expression, in practice they mostly occur in a very
precise form. In particular, expr1 is an atom, expr2 is a
record value, while
expr3 is a sequence. Since this corresponds, by far, to the most
common use of XML elements we have introduced some handy abbreviations: in
particular we allow the programmer to omit the surrounding (` )
when expr1 is an atom, and to omit the surrounding { }
and the infix semicolons ; when
expr2 is a record value. This is why we can write
<table align="center" valign="top">[ ... ], rather than
<(`table) ({align="center"; valign="top"})
>[ ... ] While these abbreviations are quite handy, they demand some care when used in record patterns. As we said, the general form of a record pattern is:
and the same abbreviations as for expressions apply. In particular, this means
that, say, the pattern <t (a)>_ stands for <(`t)
(a)>_. Therefore while <t (a)>_ matches all the elements
of tag t (and captures in the variable a the
attributes), the pattern <(t) (a)>_ matches all XML elements
(whatever their tag is) and captures their tag in the variable t
(and their attributes in a). Another point to notice is that
<t>_ stands for <t ({})>_ (more precisely, for
<(`t) ({})>_). Since {} is the closed empty
record type, then it matches only the empty record. Therefore <t>_
matches all elements of tag t that have no attibute. We have seen at the beginning of this tutorial that in order to match all element of tag t independently from whether they have attributes or not, we have to use the pattern <t ..>_ (which stands for <(`t) ({..})>_).
In the following we enumerate some simple examples to show
what we just explained. In these examples we use the following definitions for bibliographic data:
type Biblio = [(Paper|Book)*]
type Paper = <paper isbn=?String year=String>[ Author+ Title Conference Url? ]
type Book = <book isbn=String> [ Author+ Title Url? ]
type Author = <author>[ PCDATA ]
type Title = <title>[ PCDATA ]
type Conference = <conference>[ PCDATA ]
type Url = <url>[ PCDATA ]
Let bib be of type Biblio then
transform bib with
<book (a)> [ (x::(Any\Url)|_)* ] -> [ <book (a)> x ]
returns the list of all books without their Url element (if any).
transform bib with
<(book) (a)> [ (x::(Any\Url)|_)* ] -> [ <(book) (a)> x ]
returns the bibliography in which all entries (either books or papers) no longer
have their Url elements (book is now a capture variable). Equivalently we could have
pushed the difference on tags:
transform bib with
<(book) (a)> [ (x::<(Any\`url)>_|_)* ] -> [ <(book) (a)> x ]
We can perform many kinds of manipulations on the attributes by
using the operators for records,
namely r\l which deletes the field l
in the record r whenever it is present, and r1 +
r2 which merges the records r1 and
r2 by giving the priority to the fields in the latter. For
instance
transform bib with
<(t) (a)> x -> [ <(x) (a\isbn)> x ]
strips all the ISBN attributes.
transform bib with
<_ (a)> [(x::(Author|Title|Url)|_)*] -> [ <book ({isbn="fake"}+a\year)> x ]
returns the bibliography in which all Paper elements are transformed into
books; this is done by forgetting the Conference elements, by removing the year attributes and
possibly adding a fake isbn attribute. Note that since record concatenation gives priority to the record on the righ handside, then whenever the record captured by
a already contains an isbn attribute, this is preserved.
As an example to summarize what we said above, consider the the elements
table, td and tr in XHTML. In
transitional XHTML these elements can have an attribute bgcolor
which is deprecated since in strict XHTML the background color must be specified
by the style attribute. So for instance <table
bgcolor="#ffff00" style="font-family:Arial">...
must be rewritten as <table style="bgcolor:#ffff00;
font-family:Arial">... to be XHTML strict compliant. Here is a function
that does this transformation on a very simplified version of possibly nested
tables containing strings.
type Table = <table { bgcolor=?String; style=?String }>[ Tr+]
type Tr = <tr { bgcolor=?String; style=?String }>[ Td+]
type Td = <td { bgcolor=?String; style=?String }>[ Table* | PCDATA ]
let strict ([Table*]->[Table*]; [Tr+]->[Tr+]; [Td+]->[Td+]; [PCDATA]->[PCDATA])
x ->
map x with
<(t) (a& { bgcolor=c; style=s })> l
-> <(t) (a\bgcolor+{style=(s@"; bgcolor:"@c)})>(strict l)
| <(t) (a& { bgcolor=c })> l
-> <(t) (a\bgcolor+{style=("bgcolor:"@c)})>(strict l)
| <(t) (a)> l -> <(t) (a)>(strict l)
| c -> c
As an exercise the reader can try to rewrite the function strict so that the first three branches of the map are condensed into a unique branch.
|