* | Types and patterns | | Types and patterns |
In CDuce, a type denotes a set of values, and a pattern
extracts sub-values from a value. Syntactically, types and patterns
are very close. Indeed, any type can be seen as a pattern
(which accepts any value and extracts nothing), and a pattern
without any capture variable is nothing but a type.
Moreover, values
also share a common syntax with types and patterns. This is motivated
by the fact that basic and constructed values (that is, any values without
functional values inside) are themselves singleton types.
For instance (1,2) is both a value, a type and a pattern.
As a type, it can be interpreted as a singleton type,
or as a pair type made of two singleton types.
As a pattern, it can be interpreted as a type constraint,
or as a pair pattern of two type constraints.
In this page, we present all the types and patterns that CDuce recognizes.
It is also the occasion to present the CDuce values themselves, the
corresponding expression constructions, and fundamental operations on them.
|
| Capture variables and default patterns |
A value identifier inside a pattern behaves as a capture variable:
it accepts and bind any value.
Another form of capture variable is the default value pattern
( x := c ) where x
is a capture variable (that is, an identifier),
and c is a scalar constant.
The semantics of this pattern is to bind the capture variable
to the constant, disregarding the matched value (and accepting
any value).
Such a pattern is useful in conjunction with the first match policy
(see below) to define "default cases". For instance, the pattern
((x & Int) | (x := 0), (y & Int) | (y := 0))
accepts any pair and bind x to the left component
if it is an integer (and 0 otherwise), and similarly
for y with the right component of the pair.
|
| Boolean connectives |
CDuce recognize the full set of boolean connectives, whose
interpretation is purely set-theoretic.
- Empty denotes the empty type (no value).
- Any and _ denote the universal type (all the values); the preferred notation is Any for types
and _ for patterns, but they are strictly equivalent.
- & is the conjunction boolean connective.
The type t1 & t2 has all the values
that belongs to t1 and to t2.
Similarly, the pattern p1 & p2 accepts
all the values accepted by both sub-patterns; a capture variable
cannot appear on both side of this pattern.
- | is the disjunction boolean connective.
The type t1 | t2 has all the values
that belongs either to t1 or to t2.
Similarly, the pattern p1 | p2 accepts
all the values accepted by any of the two sub-patterns;
if both match, the first match policy applies, and p1
dictates how to capture sub-values. The two sub-patterns
must have the same set of capture variables.
- \ is the difference boolean connective.
The left hand-side can be a type or a pattern, but the right-hand side
is necessarily a type (no capture variable).
|
| Recursive types and patterns |
A set of mutually recursive types can be defined
by toplevel type declarations, as in:
type T1 = <a>[ T2* ]
type T2 = <b>[ T1 T1 ]
It is also possible to use the syntax
T where T1 = t1 and ... and Tn = tn
where T and the Ti are type identifiers
and the ti are type expressions. The same notation
works for recursive patterns (for which there is no toplevel declarations).
There is an important restriction concerning recursive types:
any cycle must cross a type constructor (pairs, records, XML
elements, arrows). Boolean connectives do not count as type
constructors! The code sample above is a correct definition.
The one below is invalid, because there is an unguarded cycle
between T and S.
type T = S | (S,S) (* INVALID! *)
type S = T (* INVALID! *)
|
| Scalar types |
CDuce has three kind of atomic (scalar) values:
integers, characters, and atoms. To each kind corresponds a family of types.
- Integers.
CDuce integers are arbitrarily large. An integer
literal is a sequence of decimal digits, plus an optional leading unary
minus (-) character.
- Int: all the integers.
- i--j (where i and
j are integer literals, or *
for infinity): integer interval. E.g.: 100--*,
*--0[1]
(note that * stands both
for plus and minus infinity).
- i (where i is an integer
literal): integer singleton type.
- Floats.
CDuce provider minimal features for floats. The only way to
construct a value of type Float is by the function
float_of : String -> Float - Characters.
CDuce manipulates Unicode characters. A character
literal is enclosed in single quotes, e.g. 'a', 'b', 'c'.
The single quote and the backslash character must be escaped
by a backslash: '\'', '\\'. The double
quote can also be escaped, but this is not mandatory.
The usual '\n', '\t', '\r' are recognized.
Arbitrary Unicode codepoints can be written in decimal
'\i;' (i is an decimal integer; note that the code is ended by a semicolon) or
in hexadecimal '\xi;'. Any other occurrence of
a backslash character is prohibited.
- Char: all the Unicode character set.
- c--d (where d and
d are character literals):
interval of Unicode character set. E.g.: 'a'--'z'.
- c (where c is an integer
literal): character singleton type.
- Byte: all the Latin1 character set
(equivalent to '\0;'--'\255;').
- Atoms.
Atoms are symbolic elements. They are used in particular
to denote XML tag names, and also to simulate ML sum type
constructors and exceptions names.
An atomic is written `xxx where
xxx follows the rules for CDuce identifiers.
E.g.: `yes, `No, `my-name. The atom `nil
is used to denote empty sequences.
- Atom: all the atoms.
- a (where a is an atom
literal): atom singleton type.
- Bool: the two atoms `true and
`false.
- See also: XML Namespaces.
|
| Pairs |
Pairs is a fundamental notion in CDuce, as they constitute a building
block for sequence. Even if syntactic sugar somewhat hides
pairs when you use sequences, it is good to know the existence of pairs.
A pair expression is written (e1,e2)
where e1 and e2 are expressions.
Similarly, pair types and patterns are written
(t1,t2) where t1 and
t2 are types or patterns. E.g.: (Int,Char).
When a capture variable x appears on both
side of a pair pattern p = (p1,p2), the semantics
is the following one: when a value match p,
if x is bound to v1 by
p1 and to v2 by
p2,
then x is bound to the pair (v1,v2) by
p.
Tuples are syntactic sugar for pairs. For instance,
(1,2,3,4) denotes (1,(2,(3,4))).
|
| Sequences | Values and expressions
Sequences are fundamental in CDuce. They represents
the content of XML elements, and also character strings.
Actually, they are only syntactic sugar over pairs.
Sequences expressions are written inside square brackets; element
are simply separated by whitespaces:
[ e1 e2 ... en ].
Such an expression is syntactic sugar for:
(e1,(e2, ... (en,`nil) ...)).
E.g.: [ 1 2 3 4 ].
The binary operator @ denotes sequence concatenation.
E.g.: [ 1 2 3 ] @ [ 4 5 6 ] evaluates to
[ 1 2 3 4 5 6 ].
It is possible to specify a terminator different from `nil;
for instance
[ 1 2 3 4 ; q ] denotes (1,(2,(3,(4,q)))),
and is equivalent to
[ 1 2 3 4 ] @ q.
Inside the square brackets of a sequence expression, it is possible
to have elements of the form ! e (which is not
an expression by itself), where e is an expression
which should evaluate to a sequence. The semantics is
to "open" e. For instance:
[ 1 2 ![ 3 4 ] 5 ]
evaluates to
[ 1 2 3 4 5 ].
Consequently, the concatenation of two sequences e1 @ e2
can also be written [ !e1 !e2 ]
or [ !e1 ; e2 ].
Types and patterns
In CDuce, a sequence can be heterogeneous: the element can all have
different types. Types and patterns for sequences are specified
by regular expressions over types or patterns. The syntax is
[ R ] where R is a regular expression, which
can be:
- A type or a pattern, which correspond to a single element in the
sequence (in particular, [ _ ] represents
sequences of length 1, not arbitrary sequences).
- A juxtaposition of regular expression R1 R2
which represents concatenation.
- A postfix repetition operator; the greedy operators are
R?,
R+,
R*, and the ungreedy operators are:
R??,
R+?,
R*?. For types, there is no distinction in semantics between
greedy and ungreedy.
- A sequence capture variable x::R
(only for patterns, of course).
The semantics is to capture in x the subsequence
matched by R. The same sequence capture variable
can appear several times inside a regular expression, including
under repetition operators; in that case, all the corresponding
subsequences are concatenated together. Two instances of the
same sequence capture variable cannot be nested, as in
[x :: (1 x :: Int)].
Note the difference between [ x::Int ] and
[ (x & Int) ]. Both accept sequences made of a single
integer, but the first one binds x to a sequence
(of a single integer), whereas the second one binds it to
the integer itself. -
Grouping (R). E.g.: [ x::(Int Int) y ].
-
Tail predicate /p. The type/pattern p
applies to the current tail of the sequence (the subsequence
starting at the current position). E.g.:
[ (Int /(x:=1) | /(x:=2)) _* ] will bind
x to 1 if the sequence starts
with an integer and 2 otherwise.
-
Repetition R ** n where n
is a positive integer constant, which is just a shorthand
for the concatenation of n copies of R.
Sequence types and patterns also accepts the [ ...; ... ]
notation. This is a convenient way to discard the tail of a sequence
in a pattern, e.g.: [ x::Int* ; _ ], which
is equivalent to [ x::Int* _* ].
It is possible to use the @
operator (sequence concatenation) on types, including in recursive
definitions. E.g.:
type t = [ <a>(t @ t) ? ] (* [s?] where s=<a>[ s? s? ] *)
type x = [ Int* ]
type y = x @ [ Char* ] (* [ Int* Char* ] *)
type t = [Int] @ t | [] (* [ Int* ] *)
however when used in recursive definitions @ but must be right linear so for instance the following definition are not allowed:
type t = t @ [Int] | [] (* ERROR: Ill-formed concatenation loop *)
type t = t @ t (* ERROR: Ill-formed concatenation loop *)
|
| Strings |
In CDuce, character strings are nothing but sequences of characters.
The type String is pre-defined as [ Char* ].
This allows to use the full power of regular expression
pattern matching with strings.
Inside a regular expression type or pattern, it is possible
to use PCDATA instead of Char*
(note that both are not types on their own, they only make sense
inside square brackets, contrary to String).
The type Latin1 is the subtype of String
defined as [ Byte* ]; it denotes strings that can
be represented in the ISO-8859-1 encoding, that is, strings made only
of characters from the Latin1 character set.
Several consecutive characters literal in a sequence can be
merged together between two single quotes:
[ 'abc' ] instead of [ 'a' 'b' 'c' ].
Also it is possible to avoid square brackets by using
double quotes: "abc". The same escaping rules applies
inside double quotes, except that single quotes may be escaped (but
must not), and double quotes must be.
|
| Records |
Records are set of finite (name,value) bindings. They are used
in particular to represent XML attribute sets. Names are
actually Qualified Names (see XML Namespaces).
The syntax of a record expression is
{ l1=e1; ...; ln=en }
where the li are label names (same lexical
conventions as for identifiers), and the vi
are expressions. When an expression ei
is simply a variable whose name match the field label
li, it is possible to omit it.
E.g.: { x; y = 10; z }
is equivalent to { x = x; y = 10; z = z }.
The semi-colons between fields are optional.
They are two kinds of record types. Open record types
are written { l1=t1; ...; ln=tn; ..
}, and closed record types are written
{ l1 = t1; ...; ln = tn }.
Both denote all the record values where
the labels li are present and the associated values
are in the corresponding type. The distinction is that that open
type allow extra fields, whereas the closed type gives a strict
enumeration of the possible fields. The semi-colon between fields is optional.
Additionally, both for open and close record types,
it is possible to specify optional fields by using =?
instead of = between a label and a type.
For instance, { x =? Int; y = Bool }
represents records with a y field of type
Bool, and an optional field y (that when it is
present, has type Int), and no other field.
The syntax is the same for patterns. Note that capture variables
cannot appear in an optional field. A common idiom is to bind
default values to replace missing optinal fields:
({ x = a } | (a := 1)) & { y = b }. A special syntax
makes this idiom more convenient:
{ x = a else (a:=1); y = b }.
As for record expressions, when the pattern
is simply a capture variable whose name match the field label,
it is possible to omit it. E.g.: { x; y = b; z }
is equivalent to { x = x; y = b; z = z }.
The + operator (record concatenation, with priority given
to the right argument in case of overlapping) is available on record
types and patterns. This operator can be used to make a close
record type/pattern open, or to add fields:
type t = { a=Int b=Char }
type s = t + {..} (* { a=Int b=Char .. }
type u = s + { c=Float } (* { a=Int b=Char c=Float .. } *)
type v = t + { c=Float } (* { a=Int b=Char c=Float } *)
|
| XML elements |
In CDuce, the general of an XML element is
<(tag) (attr)>content where
tag,
attr and
content are three expressions.
Usually, tag is a tag literal `xxx, and
in this case, instead of writing <(`tag)>,
you can write: <tag>.
Similarly, when attr is a record literal, you can
omit the surrounding ({...}), and also the semicolon
between attributes,
E.g: <a href="http://..." dir="ltr">[].
The syntax for XML elements types and patterns follows closely
the syntax for expressions:
<(tag) (attr)>content
where
tag,
attr and
content are three types or patterns.
As for expressions, it is possible to simplify the notations
for tags and attributes. For instance,
<(`a) ({ href=String })>[]
can be written:
<a href=String>[].
The following sample shows several way to write XML types.
type A = <a x=String y=String ..>[ A* ]
type B = <(`x | `y) ..>[ ]
type C = <c x = String; y = String>[ ]
type U = { x = String y =? String ..}
type V = [ W* ]
type W = <v (U)>V
|
| Functions |
CDuce is an higher-order functional languages: functions are
first-class citizen values, and can be passed as argument or returned
as result, stored in data structure, etc...
A functional type has the form t -> s
where t and s are types.
Intuitively, this type corresponds to functions that accept
(at least) any argument of type t, and for
such an argument, returns a value of type s.
For instance, the type (Int,Int) -> Int & (Char,Char) -> Char
denotes functions that maps any pair of integer to an integer,
and any pair of characters to a character.
The explanation above gives the intuition behind the interpretation
of functional types. It is sufficient to understand which
subtyping relations and equivalences hold between (boolean
combination) of functional types. For instance,
Int -> Int & Char -> Char is a subtype
of (Int|Char) -> (Int|Char) because
with the intuition above, a function of the first type,
when given a value of type Int|Char returns
a value of type Int or of type Char
(depending on the argument).
Formally, the type t -> s denotes
CDuce abstractions
fun (t1 -> s1; ...; tn -> sn)...
such that t1 -> s1 & ... & tn ->
sn is a subtype of t -> s.
Functional types have no counterpart in patterns.
|
| References |
References are mutable memory cells. CDuce has no built-in
reference type. Instead, references are implemented
in an object-oriented way. The type ref T
denotes references of values of type T. It
is only syntactic sugar for the type
{ get = [] -> T ; set = T -> [] }.
|
| | Complete syntax |
Below we give the complete syntax of type and pattern, the former
being patterns without capture variables
TO BE DONE |
| | [1]
You should be careful when putting parenthesis around
a type of the form *--i. Indeed,
(*--i) would be parsed as a comment.
You have to put a whitespace after the left parenthesis.
|
| |
|
|