In all these functions we have used the pattern _* to match, and
thus discard, the rest of a sequence. This is nothing but a particular regular expression over types. Type regexps can be used in patterns to match subsequences of a value. For instance the pattern
<person ..>[ _ _ Tel+] matches all person elements that specify no Email element and at least one Tel element. It may be useful
to bind the sequence captured by a (pattern) regular expression to a variable. But since a regexp is not a type, we cannot write, say, x&Tel+. So we introduce a special notation x::R to bind x to the sequence matched by the type regular expression R. For instance:
let domain (Email ->String) <_>[ _*? d::(Echar+ '.' Echar+) ] -> d
returns the last two parts of the domain of an e-mail (the *?
is an ungreedy version of *, see regular expressions patterns).
If these ::-captures are used inside the scope of the regular expression
operators * or +, or if the same variable
appears several times in a regular expression,
then the variable is bound to
the concatenation of all the corresponding matches. This is one of the
distinctive and powerful characteristics of CDuce, since it allows to
define patterns that in a single match capture subsequences of
non-consecutive elements. For instance:
type PhoneItem = {name = String; phones = [String*] }
let agendaitem (Person -> PhoneItem)
<person ..>[<name>n _ (t::Tel | _)*] ->
{ name = n ; phones = map t with <tel ..> s ->s }
transforms a person element into a record value with two fields containing
the element's name and the list of all the phone numbers. This is
obtained thanks to the pattern (t::Tel | _)* that binds to t the
sequence of all Tel elements appearing in the person. By the same rationale the pattern
( w::<tel kind="work">_ | t::<tel kind=?"home">_ | e::<email>_ )*
partitions the (Tel | Email)*
sequence into three subsequences, binding the list of work phone numbers to
w, the list of other numbers to t, and the list of e-mails to e. Alternative patterns
| follow a first match policy (the second pattern is matched
only if the first fails). Thus we can write a shorter pattern that (applied to (Tel|Email)* sequences) is equivalent:
( w::<tel kind="work">_ | t::Tel | e::_ )*
Both patterns are compiled into
( w::<tel kind="work">_ | t::<tel ..>_ | e::_)*
since checking the tag suffices to determine if the element is of type Tel.
Storing phone numbers in integers rather than in strings requires minimal
modifications. It suffices to use a pattern regular expression to strip off
the possible occurrence of a dash:
let agendaitem2 (Person -> {name=String; phones=[Int*]})
<person ..>[ <name>n _ (t::Tel|_)* ] ->
{ name = n; phones = map t with <tel ..>[(s::'0'--'9'|_)*] -> int_of s }
In this case s extracts the subsequence formed only by numerical
characters, therefore int_of s cannot fail because s
has type [ '0'--'9'+ ] (otherwise, the system would have issued a
warning) (Actually the type system deduces for s the following type
[ '0'--'9'+ '0'--'9'+] (subtype of the former) since there always
are at least two digits).
First use of overloading
Consider the type declaration
type PhoneBook = <phonebook>[PhoneItem*]
If we
add a new pattern matching branch in the definition of the function
names, we make it work both with ParentBook and
PhoneBook elements. This yields the following overloaded function:
let names3 (ParentBook -> [Name*] ; PhoneBook -> [String*])
| <parentbook> x -> (map x with <person ..>[ n _* ] -> n)
| <phonebook> x -> (map x with { name=n } -> n)
The overloaded nature of names3 is expressed by its interface, which
states that when the function is applied to a ParentBook element it returns
a list of names, while if applied to a PhoneBook element it
returns a list of strings. We can factorize the two branches in a unique
alternative pattern:
let names4 (ParentBook -> [Name*] ; PhoneBook -> [String*])
<_> x -> map x with ( <person ..>[ n _* ] | { name=n } ) -> n
The interface ensures that the two representations will never mix. |