* | XML Schema | | Overview |
CDuce partially supports XML
Schema Recommendations (Primer, Structures, Datatypes). Using this CDuce
feature it is possible to manipulate XML documents whose leaves are typed
values like integers, dates, binary data, and so on.
CDuce supports XML Schema by implementing the following features:
This manual page describes how to use these features in CDuce, all the
documents used in the examples are available in the manual section: XML Schema sample documents.
Note:
The support for XML Schema does not currently interact well with
separate compilation. When a CDuce unit script.cd
which uses an XML Schema
is compiled, the resulting script.cdo object
refers to the XML Schema by name. That is, when these units
are run, the XML Schema must still be available from the current
directory and must not have been changed since compilation.
|
| XML Schema components (micro) introduction |
An XML Schema document could define four different kinds of component, each
of them could be imported in CDuce and used as CDuce types:
- Type definitions
A type definition defines either a simple type or a complex type. The
former could be used to type more precisely the string content of an
element. You can think at it as a refinement of #PCDATA. XML Schema
provides a set of predefined
simple types and a way to define new simple types. The latter could
be used to constraint the content model and the attributes of an XML
element. An XML Schema complex type is strictly more expressive than a DTD
element declaration.
- Element declarations
An element declaration links an attribute name to a complex type.
Optionally, if the type is a simple type, it can constraints the set of
possible values for the element mandating a fixed value or providing a
default value.
- Attribute group definitions
An attribute group definitions links a set of attribute declarations to a
name which can be referenced from other XML Schema components.
- Model group definitions
A model group definition links a name to a constraint over the complex
content of an XML element. The linked name can be referenced from other
XML Schema components.
Attribute declaration currently don't produce any CDuce type
and can't be used for validation themselves.
|
| XML Schema components import |
In order to import XML Schema components in CDuce, you first need to tell
CDuce to import an XML Schema document. You can do this using the
schema keyword to bind an uppercase identifier to a local
schema document:
# schema Mails = "tests/schema/mails.xsd";;
Registering schema type: attachmentType
Registering schema type: mimeTopLevelType
Registering schema type: mailsType
Registering schema type: mailType
Registering schema type: bodyType
Registering schema type: envelopeType
Registering schema element: header
Registering schema element: Date
Registering schema element: mails
Registering schema attribute group: mimeTypeAttributes
Registering schema model group: attachmentContent
The above declaration will (try to) import all schema components included in
the schema document mails.xsd
as CDuce types. You can reference them using the
dot operator, e.g. S.mails.
XML Schema permits ambiguity in components name. CDuce chooses
to resolve references to Schema components in this order:
elements, types, model groups, attribute group.
The result of a schema component reference is an ordinary CDuce type which
you can use as usual in function definitions, pattern matching and so on.
let is_valid_mail (Any -> Bool)
| Mails.mailType -> `true
| _ -> `false
|
| Correctness remark: while parsing XML Schema documents, CDuce
assumes that they're correct with respect to XML Schema recommendations.
At minimum they're required to be valid with respect to XML
Schema for Schemas. It's recommended that you will check for
validity your schemas before importing them in CDuce, strange behaviour is
assured otherwise.
|
| Toplevel directives |
The toplevel directive #env supports schemas, it lists the
currently defined schemas.
The toplevel directive #print_type supports schemas too, it can
be used to print types corresponding to schema components.
# #print_type Mails.bodyType;;
[ Char* ]
For more information have a look at the manual section about toplevel directives.
|
| XML Schema → CDuce mapping |
XML Schema predefined simple types are mapped to CDuce types
directly in the CDuce implementation preserving as most as possible XML
Schema constraints. The table below lists the most significant mappings.
XML Schema predefined simple type | CDuce type |
---|
duration , dateTime , time ,
date , gYear , gMonth , ...
|
closed record types with some of the following fields (depending on
the Schema type): year , month ,
day , hour , minute ,
second , timezone | boolean | Bool | anySimpleType , string ,
base64Binary , hexBinary ,
anyURI | String | integer | Int | nonPositiveInteger , negativeInteger ,
nonNegativeInteger , positiveInteger ,
long , int , short ,
byte | integer intervals with the appropriate limits | string , normalizedString , and the other
types derived (directly or indirectly) by restriction from string
| String | NMTOKENS , IDREFS , ENTITIES | [String*] | decimal ,float ,double | Float |
(Not properly supported)
decimal ,
float , double , NOTATION ,
QName | String |
|
Simple type definitions are built from the above types following
the XML Schema derivation rules.
XML Schema complex type definitions are mapped to CDuce types
representing XML elements which can have any tag, but whose attributes
and content are constrained to be valid with respect to the original
complex type.
As an example, the following XML Schema complex type (a simplified
version of the homonymous envelopeType defined in mails.xsd):
<xsd:complexType name="envelopeType">
<xsd:sequence>
<xsd:element name="From" type="xsd:string"/>
<xsd:element name="To" type="xsd:string"/>
<xsd:element name="Date" type="xsd:dateTime"/>
<xsd:element name="Subject" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
will be mapped to an XML CDuce type which must have a From
attribute of type String and four children. Among them the Date
children must be an XML element containing a record which represents a
dateTime Schema type.
# #print_type Mails.envelopeType;;
<(Any)>[
<From>String
<To>String
<Date>{
positive = Bool;
year = Int; month = Int; day = Int;
hour = Int; minute = Int; second = Int;
timezone =? { positive = Bool; hour = Int; minute = Int }
}
<Subject}>String
]
XML Schema element declarations can bound an XML element either
to a complex type or to a simple type. In the former case the conversion
is almost identical as what we have seen for complex type conversion.
The only difference is that this time element's tag must correspond to
the name of the XML element in the schema element declaration, whereas
previously it was Any type.
In the latter case (element with simple type content), the corresponding
CDuce types is an element type. Its tag must correspond to the name of
the XML element in the schema element declaration; its content type its
the CDuce translation of the simple type provided in the element
declaration.
For example, the following XML Schema element (corresponding to the
homonymous element defined in mails.xsd):
<xsd:element name="header">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute ref="name" use="required" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
will be translated to the following CDuce type:
# #print_type Mails.header;;
<header name = String>String
Note that the type of the element content is not a sequence
unless the translation of the XML Schema types is a sequence itself (as
you can notice in the example above). Compare it with the following
where the element content is not a sequente, but a single record:
# #print_type Mails.Date;;
<Date>{
positive = Bool;
year = Int; month = Int; day = Int; hour = Int;
minute = Int; second = Int;
timezone =? { positive = Bool; hour = Int; minute = Int }
}
XML Schema wildcards (xsd:any)
and nullable elements (xsi:nil) are supported.
XML Schema attribute group definitions are mapped to record types
containing one field for each attribute declarations contained in the
group. use constraints are respected: optional attributes are
mapped to optional fields, required attributes to required
fields. XML Schema attribute wildcards are partly supported;
they simply produce open record types instead of closed one,
but the actual constraints of the wildcards are discarded.
The following XML Schema attribute group declaration:
<xsd:attributeGroup name="mimeTypeAttributes">
<xsd:attribute name="type" type="mimeTopLevelType" use="required" />
<xsd:attribute name="subtype" type="xsd:string" use="required" />
</xsd:attributeGroup>
will thus be mapped to the following CDuce type:
# #print_type Mails.mimeTypeAttributes;;
{ type = [
'image' | 'text' | 'application' | 'audio' | 'message' | 'multipart' | 'video'
];
subtype = String }
XML Schema model group definitions are mapped to CDuce sequence
types. minOccurs and maxOccurs constraints are
respected, using CDuce recursive types to represent unbounded
repetition (i.e. Kleene star).
all constraints, also known as interleaving
constraints, can't be expressed in the CDuce type system avoiding
type sizes explosion. Thus, this kind of content models are normalized
and considered, in the type system, as sequence types (the
validator will reorder the actual XML documents).
Mixed content models are supported.
As an example, the following XML Schema model group definition:
<xsd:group name="attachmentContent">
<xsd:sequence>
<xsd:element name="mimetype">
<xsd:complexType>
<xsd:attributeGroup ref="mimeTypeAttributes" />
</xsd:complexType>
</xsd:element>
<xsd:element name="content" type="xsd:string" minOccurs="0" />
</xsd:sequence>
</xsd:group>
will be mapped to the following CDuce type:
# #print_type Mails.attachmentContent;;
[ X1 <content}>String | X1 ] where
X1 = <mimetype S.mimeTypeAttributes>[ ]
|
| XML Schema validation |
The processes of XML Schema validation and assessment check that an XML
Schema instance document is valid with respect to an XML Schema document and
add missing information such as default values. The CDuce's notion of Schema
validation is a bit different.
CDuce permits to have XML values made of arbitrary types, for example you
can have XML elements which have integer attributes. Still, this feature is
rarely used because the function used to load XML documents
(load_xml) returns XML values which have as leaves values of
type PCDATA.
Once you have imported an XML Schema in CDuce, you can use it to validate an
XML value returned by load_xml against an XML Schema component
defined in it. The process of validation will basically build a CDuce value
which has the type corresponding to the conversion of the XML Schema type of
the component used in validation to a CDuce type. The conversion is the same
described in the previous secion. Note that is not strictly necessary that
the input XML value comes from load_xml it's enough that it has
PCDATA values as leaves.
During validation PCDATA strings are parsed to build CDuce values
corresponding to XML Schema simple types and whitespace are handled as
specified by XML Schema whiteSpace facet. For example,
validating the 1234567890 PCDATA string against an
xsd:integer simple type will return the CDuce value
1234567890 typed with type Int.
Default values for missing attributes or elements are also added where
specified.
You can use the validate keyword to perform validation in CDuce
program. The syntax is as follows: validate <expr> with
<schema_ref> where schema_ref is defined as described
in XML Schema components import. Same ambiguity rules
will apply here.
More in detail, validation can be applied to different kind of CDuce values
depending on the type of Schema component used for validation.
The typical use of validation is to validate against element
declaration. In such a case validate should be invoked on an XML
CDuce value as in the following example.
# let xml = <Date>"2003-10-15T15:44:01Z" in
validate xml with Mails.Date;;
- : S.Date =
<Date> {
time_kind=`dateTime;
positive=`true;
year=2003; month=10; day=15;
hour=15; minute=44; second=1;
timezone={ positive=`true; hour=0; minute=0 }
}
The tag of the given element is checked for consistency with the
element declaration; attributes and content are checked against the
Schema type declared for the element.
Sometimes you may want to validate an element against an XML Schema
complex type without having to use element declarations. This
case is really similar to the previous one with the difference that the
Schema component you should use is a complex type declaration, you can
apply such a validation to any XML value. The other important difference
is that the tag name of the given value is completely ignored.
As an example:
# let xml = load_xml "envelope.xml" ;;
val xml : Any = <ignored_tag From="fake@microsoft.com">[
<From>[ 'user@unknown.domain.org' ]
<To>[ 'user@cduce.org' ]
<Date>[ '2003-10-15T15:44:01Z' ]
<Subject>[ 'I desperately need XML Schema support in CDuce' ]
<header name="Reply-To">[ 'bill@microsoft.com' ]
]
# validate xml with Mails.envelopeType;;
- : S.envelopeType =
<ignored_tag From="fake@microsoft.com">[
<From>[ 'user@unknown.domain.org' ]
<To>[ 'user@cduce.org' ]
<Date> {
time_kind=`dateTime;
positive=`true;
year=2003; month=10; day=15;
hour=15; minute=44; second=1;
timezone={ positive=`true; hour=0; minute=0 }
}
<Subject>[ 'I desperately need XML Schema support in CDuce' ]
<header name="Reply-To">[ 'bill@microsoft.com' ]
]
Similarly you may want to validate against a model group. In this
case you can validate CDuce's sequences against model groups. Given
sequences will be considered as content of XML elements.
As an example:
# let xml = load_xml "attachment.xml";;
val xml : Any =
<ignored_tag ignored_attribute="foo">[
<mimetype type="application"; subtype="msword">[ ]
<content>[ '\n ### removed by spamoracle ###\n ' ]
]
# let content = match xml with <_>cont -> cont | _ -> raise "failure";;
val content : Any = [
<mimetype type="application"; subtype="msword">[ ]
<content>[ '\n ### removed by spamoracle ###\n ' ]
]
# validate content with Mails.attachmentContent;;
- : Mails.attachmentContent =
[ <mimetype type="application"; subtype="msword">[ ]
<content>[ '\n ### removed by spamoracle ###\n ' ]
]
Finally is possible to validate records against attribute groups.
All required attributes declared in the attribute group should have
corresponding fields in the given record. The content of each of them is
validate against the simple type defined for the corresponding attribute
in the attribute group. Non required fields are added if missing using
the corresponding default value (if any).
As an example:
# let record = { type = "image"; subtype = "png" };;
val record :
{ type = [ 'image' ] subtype = [ 'png' ] } =
{ type="image" subtype="png" }
# validate record with Mails.mimeTypeAttributes ;;
- : { type = [ 'image' | 'text' | ... ] subtype = String } =
{ type="image" subtype="png" }
|
| XML Schema instances output |
It is possible to use the normal print_xml
and print_xml_utf8 built-in functions to print
values resulting from XML Schema validation.
|
| Unsupported XML Schema features |
The support for XML Schema embedded in CDuce does not attempt
to cover the full XML Schema specification. In particular,
imported schemas are not checked to be valid. You can use
for instance this
on-line validator to check validity of a schema.
Also, some features from the XML Schema specification are not or
only partially supported. Here is a non-exhaustive list of limitations:
-
Substitution groups.
-
Some facets (pattern, totalDigits, fractionDigits).
- <redefine> (inclusion of an XML Schema with modifications).
- xsi:type.
|
| | |
|
|