Chapter 6 Definition of the Programming Language Malaga
6.1 Characterisation of Malaga
A malaga rule file resembles much in programming languages like Pascal or C (of
course, those languages do not have a Left Associative Grammar formalism built
in). A malaga source file must be translated before execution, this is the same
as for compiler languages. But the generated Malaga code is not a machine
code, but an intermediate code and has to be executed (interpreted) by an analysis program.
We may characterise Malaga as follows, as far as programming structures and
data structures are concerned:
- structured values:
- The basic values in Malaga are symbols (names that
can be used e.g. for categories or subcategories), numbers (floating point
numbers), and strings. Values can be combined to ordered lists or records
(also known as feature structures). A value in a list or a record can be a
list or a record itself. An ``ambiguous'' symbol like ``singular_plural'' can be assigned a list of symbols like ``<singular, plural>''; such a symbol is called a multi symbol.
- structured statements:
- In Malaga, the concept of statement blocks is
implemented in a similar way as it is in the programming language Pascal.
There are structured control statements to select or repeat a statement
sequence. A variable is always defined locally, i.e. it only exists
from the point where it has been defined up to the end of the statement
sequence in which it has been defined.
- no type restrictions:
- Any value can be assigned to a variable and the
programmer can freely define the structure of values.
- no side effects:
- Malaga is, unlike programming languages like Pascal or
C, free of side effects. If a variable gets a value, no other variable will
be changed. Analysis paths are independent of each other.
- termination:
- A Malaga grammar that contains no recursive subrules and no
repeat statements is guaranteed to terminate, i.e. it can never hang
in a loop.
- variables:
- In a define statement, a variable is defined and gets
an initial value. Use an assignment to set a variable that has already
been defined to a new value.
- operators:
- Many generative grammar theories or linguistical programming
languages use the concept of unification of feature structures.
Malaga does not use unification, but it offers some operators to build lists
or records (feature structures) explicitly. Since Malaga does without
unification, analyses are much faster.
6.2 Malaga Source Texts
Source texts in Malaga are format-free; this means that between lexical symbols
(strings, identifiers, keywords, numerals and symbols such as ``+'',
``~'' or ``:='') there may be blanks or newlines (whitespaces) or
comments. Between two identifiers or two keywords there must be at
least one whitespace to separate them syntactically.
In this documentation, the syntax of the source text components is defined
formally in EBNF notation. The EBNF lines are printed in typewriter style and
headed by ``$$''.
$$ Comment ::= "#" {printing_char} .
A comment may be inserted everywhere where a whitespace may be inserted. A
comment begins with the symbol ``#''and extends to the end of the line.
Comments are being ignored.
6.2.2 The include Statement
$$ Include ::= "include" String ";" .
A Malaga file may contain the statement
include "filename";
In a rule file, it can stand everywhere a rule can stand. In lexicon files, it
can stand in place of a value; in symbol files, it can replace a symbol
definition. The text of the included file is inserted verbatim at the very
location where the include statement occurs. The file name has to be
stated relatively to the directory of the file which contains the include
statement.
6.2.3 Identifiers
$$ Identifier ::= (Letter | "_" | "&") {Letter | Digit | "_" | "&"} .
In Malaga, names for variables, constants, symbols, and rules, and (see below
for explanation) are called identifiers. An identifier may consist of
uppercase and lowercase characters, the underscore ``_'', the ampersand
``&'', the vertical bar ``|'', and, from the second character on,
also of digits. Uppercase and lowercase characters are not distinguished, i.e.,
Malaga is not case-sensitive. Malaga keywords must not be used as
identifiers. A variable name must start with a ``$'', a constant name
must start with a ``''. The same identifier may be used as variable
name, constant name, symbol name, or rule name independently. Malaga can
distinguish them by the context in which they occur.
Valid identifiers would be ``Noun'', ``noun'' (the same as the
first), ``R2D2'', ``Vb_aux'', ``A|G|D'', ``_INF''.
Identifiers like ``2Noun'', ``Verb.Frame'', ``OK?'', ``_ INF'' are not valid.
6.3 Values
Malaga expressions can have values with very complex structures. To describe
how those values can be composed from simple values a few rules suffice. Simple
values in Malaga are symbols, numbers, and strings, which
can be composed to form records and lists.
$$ Symbol ::= Identifier .
The central data type in Malaga is the symbol. It is used for describing
syntactic or semantic properties of an allomorph, a word, or a sentence. A
symbol is an identifier like ``Verb'', ``reflexive'', ``Sing_1''. The symbols ``nil'', ``yes'', ``no'', ``symbol'', ``string'', ``number'', ``list'', and ``record'' are predefined and have special meanings.
$$ Number ::= [-] Digit {Digit} ["." Digit {Digit}] "E" Digit {Digit} .
A number in Malaga consists of an optional ``-'' sign, an integer part, an
optional fractional part and an optional exponent of the form ``E[+|-]n''. There must be a dot between the integer part and the
fractional part. Examples: ``0'', ``1'', ``1.0'', ``-13.75'', ``1.2E-5''.
$$ String ::= '"' {printing_char_except_double_quotes | '\"' | '\\'} '"' .
A string may consist of any number of characters (it may also be empty). It
must be enclosed in double quotes and must not extend over more than one line.
Within the double quotes there may be any combination of printable characters
except the backslash ``\
'' and the double quotes. These characters must
be preceded by a ``\
'' (escape character). Examples: "Hello", "He says: \
"Great\
"".
$$ List ::= "<" Expression {"," Expression} ">" .
A list is an ordered sequence of values. The values are separated by commas and
enclosed in angle brackets:
<element1, element2, ...>
A list may as well be empty. The elements in a list may be arbitrarily complex;
they may also be lists or records.
$$ Record ::= "[" Symbol-Value-Pair {"," Symbol-Value-Pair} "]" .
$$ Symbol-Value-Pair ::= Expression ":" Expression .
A record is a collection of attributes. An attribute consists of a
symbol, the attribute name, and an associated attribute value,
which can by an arbitrary Malaga value. The attribute name serves as an access
key for the attribute value, so all attributes in a record must have different
names.
Records are noted down as follows:
[name1: value1, name2: value2, ...]
where name i denotes an attribute name and value i the associated
attribute value. Example: ``[Class: Verb, Reg: Reg, Val: dirObj]''.
A record with no attributes, ``[]'', is called empty record.
6.4 Expressions
$$ Expression ::= ["-"] Term {("+" | "-") Term} .
$$ Term ::= Factor {("*" | "/") Factor} .
$$ Factor ::= Value {"." Value} .
$$ Value ::= Symbol | String | Number | List | Record | Constant
$$ | Subrule-Invocation | Variable | "(" Condition ")" .
$$ Constant-Expression ::= Expression .
An expression is the form in which a value is used in Malaga. Values can be
written as follows:
[Surf: "he", Class: Pron, Case&Number: S3]
Variables (these are placeholders for values within a rule) can as well be used
as expressions:
$Pron
Furthermore, constants (placeholders for values in a rule file) can be used as
expressions:
@combination_table
All three forms can be mixed:
[Surf: "he", Class: Pron, Case&Number: $result]
Furthermore, there are operators which modify values or combine two values to
form a new value. Using those operators complex values can be composed. All
operators work left-associatively and have a different priority (an operator
with higher priority is applied before one with lower priority):
operator |
priority |
. |
3 |
*, / |
2 |
+, - |
1 |
The order in which the operators are to be applied can be changed by bracketing
with round parentheses ``()''.
$$ Variable ::= "$" Identifier .
A variable is marked by a ``$'' preceding its name. The name may be any
valid identifier. A variable is defined by the define statement; it
receives a value and may from this point on be used in all expressions within
the statement sequence. In such a statement sequence (and all subordinated
statement sequences) a variable with the same name must not be defined again.
$$ Constant ::= "@" Identifier .
A constant is marked by a ``@'' preceding its name. The name may be any
valid identifier. A constant is defined by a constant definition in a rule
file, outside a rule. It is assigned a value and can be used in subsequent
rules and constant definitions in that rule file.
6.4.3 Subrule Invokations
$$ Subrule-Invocation ::= Rule-Name "(" Expression {"," Expression} ")" .
$$ Rule-Name ::= Identifier .
A subrule is invoked when an expression ``subrule (value1,
value2, ...)'' is evaluated. The expression yields the value that is
returned by the return statement in the subrule. The number of parameters
in a subrule invokation must match the number of parameters in the subrule
definition.
There is a number of default subrules which are predefined. They are called
functions and they all take one parameter only.
6.4.4 The Function ``atoms''
The expression ``atoms(symbol)'' yields the list of atomic
symbols for symbol. If symbol is not a multi-symbol, it yields
the list <symbol>.
6.4.5 The Function ``capital''
The expression ``capital (string)'' yields yes if the
first character of the string string is a capital letter, else it
yields no.
6.4.6 The Function ``length''
The expression ``length (list)'' yields the number of
elements in ``list''.
6.4.7 The Function ``multi''
The expression ``multi(list)'', where list is a
list of symbols, yields the multi symbol whose atomic list corresponds to list. If list contains a single atomic symbol, this symbol will be
yield by the expression.
6.4.8 The Function ``set''
The expression ``set(list)'' yields a list which contains
each element of list, but only once. That means, the list is converted to
a set.
6.4.9 The Function ``switch''
The expression ``switch (symbol)'' yields the current value
of the switch associated to ``symbol''. Use the option switch to
change this value.
6.4.10 The Function ``symbol_name''
The expression ``symbol_name (symbol)'' yields the name of
symbol as a string.
6.4.11 The Function ``transmit'' (malaga)
The expression ``transmit (value)'' writes value,
converted to text format, to the transmit process via pipe and reads a value in
text format from the transmit process via pipe. The answer is converted to the
internal Malaga value format and returned as the result of the expression.
When this function is evaluated, the transmit process is started if it has not
been started yet. The command line of the transmit process is specified by the
option transmit.
6.4.12 The Function ``truncate''
The expression ``truncate (number)'' yields the largest
integer number that is not greater than number.
6.4.13 The Function ``value_type''
The expression ``value_type (value)'' yields the type of
value. The type information is coded as one of the symbols ``symbol'', ``string'', ``number'', ``list'', or ``record''.
6.4.14 The Operator ``.''
This operator may only be used in the following ways:
-
The expression ``record.symbol'' yields the
attribute value of the attribute of record whose name is symbol. If there is no attribute in record whose name is symbol, the expression yields the special symbol nil.
- The expression ``list.number'' yields the element of
list at position number. If there is no element at position
number in list, the expression yields the special symbol nil.
- The expression ``value.list'', where list
is a list <e1, e2, ...> of symbols and/or numbers, serves
as an abbreviation for ``value.e1.e2...''.
6.4.15 The Operator ``+''
This operator may only be used in the following ways:
-
The expression ``string1 + string2'' yields the
concatenation of string1 and string2.
- The expression ``list1 + list2'' yields the
concatenation of list1 and list2.
- The expression ``number1 + number2'' yields the sum
of number1 and number2.
- The expression ``record1 + record2'' yields a
record wich consists of all attributes of record1 and record2. If
record1 and record2 have a common attribute names, the
corresponding attributes in the result record will have the attribute values
from record2, in contrast to the operator ``*''.
6.4.16 The Operator ``-''
This operator may only be used in the following ways:
-
The expression ``record - symbol'' yields record without the attribute named symbol, if symbol is an
attribute name in record. If not, the expression yields record.
- The expression ``record - list'', where list is a list of symbols, yields record without the attributes
in list.
- The expression ``list - number'' yields list without the element at index number. If this element does not exist,
the expression yields list.
- The expression ``list1 - list2'' yields the
multi-set difference of the two lists list1 and list2. This
means, it yields the list list1, but the first n appearances of each
element will be deleted, if that element appears n times in list2.
- The expression ``number1 - number2'' yields the
difference of number1 and number2.
6.4.17 The Operator ``*''
This operator may only be used in the following ways:
-
The expression ``record * symbol'' yields the
record which only contains the attribute of record whose name is symbol.
- The expression ``record1 * record2'' yields the
- The expression ``record1 + record2'' yields a
record wich consists of all attributes of record1 and record2. If
record1 and record2 have a common attribute names, the
corresponding attributes in the result record will have the attribute values
from record1, in contrast to the operator ``+''.
record which containsonly contains the attribute of record whose name is symbol.
- The expression ``record * list'', where list is a list of symbols, yields the record which only contains the
attributes of record whose names are in list.
- The expression ``list1 * list2'' yields the
``intersection'' of the lists interpreted as multi-sets; if an element is m
times contained in list1
and n times contained in list2, it will be min(m, n) times
contained in the result.
- The expression ``number1 * number2'' yields the
product of number1 and number2.
6.4.18 The Operator ``/''
This operator may only be used in the following ways:
-
The expression ``list1 / list2'' yields the
list which contains all elements of list1 which are not elements of
list2.
- The expression ``list / number'' yields the list
which contains all elements of list without the leftmost number elements, if number is positive, or without the rightmost
-number elements, if number is negative.
- The expression ``number1 / number2'', where number2 is not 0, yields the quotient of number1 and number2.
6.5 Conditions
$$ Condition ::= Comparison ({"and" Comparison} | {"or" Comparison}) .
$$ Comparison ::= ["not"] (Expression [Comparison-Operator Expression]
| Match-Comparison) .
$$ Comparison-Operator ::= "=" | "/=" | "~" | "/~" | "in" | "less" | "greater"
| "less_equal" | "greater_equal" .
A condition can either be true or false, as in ``Verb = Verb'' or ``Verb = Noun'', respectively.
An expression that is evaluated to any of the symbols yes or no is
a valid condition.
A condition can be used everywhere a (non-constant) value is needed. It will
evaluate to yes or no. In this case, the condition must be
surrounded by parentheses.
6.5.1 The Operators ``='' and ``/=''
The condition ``expr1 = expr2'' tests whether the
expressions expr1 and expr2 are equal. There are several
possibilities:
-
expr1 and expr2 are strings, symbols or numbers.
- In this
case expr1 and expr2 must be identical.
- expr1 and expr2 are lists.
- In this case expr1
and expr2 must match element by element.
- expr1 and expr2 are records.
- In this case expr1
and expr2 must contain the same attributes (though not necessarily in
the same order) as in expr2.
For nested structures, equality is tested recursively.
If expr1 and expr2 do not have the same type, the test
results in an error; only the symbol nil can be compared to any value.
The comparison ``expr1 /= expr2'' holds iff the
comparison ``expr1 = expr2'' does not hold.
6.5.2 The Operators ``less'', ``less_equal'', ``greater'', ``greater_equal''
A condition of type ``expr1 operator expr2'' compares
two numbers. Here, operator can have the following values:
operator |
meaning |
less |
< |
less_equal |
£ |
greater |
> |
greater_equal |
³ |
If either expr1 or expr2 is no number, an error will be
reported.
6.5.3 The Operators ``~'' and ``/~''
For a comparison ``expr1 ~ expr2'', expr1
and expr2 must be lists or symbols.
If expr1 and expr2 are symbols, the list of their atomic
symbols (atoms(expr1) and atoms(expr2) will be used for the comparison instead of the symbols themself.
The comparison test whether the lists do congruate, this means, whether
they have an element in common.
The comparison ``expr1 /~ expr2'' holds iff the
comparison ``expr1 ~ expr2'' does not hold.
6.5.4 The Operator ``in''
The operator ``in'' can be only used in the following ways:
-
The condition ``symbol in record'' holds iff record contains an attribute named symbol.
- The condition ``value in list'' holds iff value is an element of list.
6.5.5 The matches Condition (Regular Expressions)
$$ Match-Comparison ::= Expression "matches" "(" Segment {"," Segment} ")".
$$ Segment ::= [Variable ":"] Constant-Expression .
The condition
expr matches (pattern)
interprets pattern as a pattern (a regular expression) and
tests whether expr matches pattern. Patterns are defined as
follows:
- pattern ::= alternative { ``|'' alternative }
The string must be identical with one of the alternatives.
- alternative ::=
{ atom [ ``*'' | ``?'' | ``+'' ] }
An alternative is a (possibly empty) sequence of atoms. An atom in a pattern
corresponds to a character in a string. By using an optional postfix operator
it is possible to specify for any atom how often it may be repeated within
the string at that location: zero times or once, at least once (``+''),
or arbitrarily often, including zero times (``*'').
- atom ::= ``('' pattern ``)''
A pattern may be grouped by parentheses.
- atom ::= ``['' [ ``
^
'' ] range {
range } ``]''
A character class. It represents exactly one character from one of the
ranges. If the symbol ``^
'' is the first one in the class, the
expression represents exactly one character that is not contained in
one of the ranges.
- atom ::= ``.''
Represents any character.
- atom ::= character
Represents the character itself.
- range ::= character1 [ ``-'' character2 ]
The range contains any character with a code at least as big as the code of
character1 and not bigger than the code of character2. The
code of character2 must be at least as big as the code of character1. If character2 is omitted, the range only contains
character1.
- character ::= Any character except ``
*?+[]^-.\|()
''
To use one of the characters ``*?+[]^-.\|()
'', it must be preceded by
a ``\
'' (escape character).
You can divide the pattern into segments:
$surf matches ("un|in|im|ir|il", ".*", "(en)?")
is is the same as
$surf matches ("(un|in|im|ir|il).*(en)?").
A section of the string can be stored in a variable by prefixing the respective
pattern with ``variable_name:'', as in
$surf matches ($a: "un|in|im|ir|il", ".*")
The variables defined by pattern matching are only defined in the statement
sequence which is being executed if the pattern matching is successful. A
matches condition that is
-
contained in a disjunction (an or condition),
- contained in a negation (a not condition), or
- used as a value (e.g. in an assignment)
may not have variable definitions in it.
6.6 The Operators not, and, and or
Conditions can be combined logically:
-
The condition ``not cond'' is true if condition cond is false.
- The condition ``cond1 and cond2 and cond3
and ...'' is true if all conditions cond1, cond2, cond3, ... are true. The conditions are only tested until one of them
is false (short-cut evaluation).
- The condition ``cond1 or cond2 or cond3
or ...'' is true if at least one of the conditions cond1, cond2, cond3, ... is true. The conditions are only tested until
one of them is true (short-cut evaluation).
The operator not takes exactly one argument. Complex conditions have to
be put in parentheses ``( )''.
The operators and and or may not be mixed; otherwise the order of
evaluation would be ambiguous. They have to be put in parentheses
``( )''.
6.7 The Symbol Table
$$ Symbol-Definition ::= Symbol [":=" "<" Symbol {"," Symbol} ">"] ";".
Every symbol used in a grammar has to be defined exactly once in the symbol table. Every symbol must be followed by a semicolon:
verb; noun; adjective;
Symbols that are being defined that way are called atomic symbols. A
symbol can also be defined as a multi-symbol. Then the entry for this
symbol has the following format:
symbol := list;
The list for this symbol must consist of at least two atomic symbols,
all different from those that have already been defined. This list will be
used by the operators ``~'' and ``/~'', ``atoms'', and
``multi''. The lists in the symbol table must be all different; they may
not only differ in the order of their elements.
6.8 The Initial State
$$ Initial ::= "initial" Constant-Expression "," Rule-Set ";" .
$$ Rule-Set ::= "rules" (Rules {"else" Rules} | "(" Rules {"else" Rules} ")") .
$$ Rules ::= Rule-Name {"," Rule-Name} .
The initial state in a combination rule file is defined as follows:
initial value,
rules rule1, rule2, ...; |
The initial state specifies a category for the empty word start (or sentence
start) in a combi rule file; the rules listed behind rules are applied in
parallel to combine the empty word (sentence) start with the first allomorph
(word form). The rules may be enclosed in parentheses.
If you want rules to be executed only if no other rule
has been successful, you can put their names behind the other rules'
names and write an else in front of them:
initial value rules rule1, rule2
else rule3, rule4 else ...;
If none of the normal rules rule1 and rule2 have been
successful, rule3 and rule4 are executed. If these rules also
fails, the next rules are executed, and so on.
6.9 The Constant Definition
$$ Constant-Definition ::= "define" Constant ":=" Constant-Expression ";" .
A constant definition is of the form
@constant := expr;
The constant expression expr will be evalued and the constant @constant will be defined to have this value. The constant must not be
defined previously. The constant is valid from this definition up to the end of
the rule file.
$$ Rule ::= Rule-Type Rule-Name "(" Variable {"," Variable} ")" ":"
$$ {Statement} "end" [Rule-Type] [Rule-Name] ";" .
$$ Rule-Type ::= "allo_rule" | "combi_rule" | "end_rule" | "pruning_rule"
$$ "robust_rule" | "input_filter" | "output_filter" | "subrule" .
A rule is a sequence of statements that is executed as a unit:
combi_rule name ($param1, $param2, ...): |
statement1 |
statement2 |
... |
end name; |
A rule has to begin with one of the keywords allo_rule, combi_rule, end_rule, pruning_rule, robust_rule, input_filter, output_filter or subrule. It is followed by its
parameter list, a list of variable names in parentheses. The variables
will be assigned the parameter values when the rule is executed. The number of
parameters depends on the rule type. The rule names have the following
meanings:
-
``allo_rule ($lex_entry)'':
- An allo-rule
must occur exactly once in an allomorph rule file. It analyses a lexical
entry and must generate one or more allomorph entries (via result). An
allomorph rule has one parameter, namely the lexicon entry.
- ``combi_rule ($start, $next, $surf, $index)'':
-
Any number of combi-rules may occur in a combi-rule file. Before
processing such a rule, the next segment (either the next allomorph or the
next word form) is being read. The first parameter is the Start category, the
second is the Next category, the third is the Next surface, and the fourth is
the Next index. The third and the fourth parameter are optional. A combi-rule
may state a successor rule set or accept the analysed input (both via result).
- ``pruning_rule ($list)'':
- A pruning-rule may occur
at most once in a syntax rule file. During syntax analysis, it can decide
which states are still valid and which are to be deleted. The parameter is a
list of categories of the states that have consumed the same input so far.
The pruning-rule must execute a return statement with a list of yes- and no-symbols. Each state in $list corresponds to a
symbol in the result list. If the symbol is yes, the corresponding
state is preserved. If the symbol is no, the state is abandoned.
- ``robust_rule ($surface)'':
- A
robust-rule can only appear at most once a morphology rule file. If robust
analysis has been switched on by the robust command, and a word form
could not be recognised by the combi-rules, the robust-rule is executed with
the surface of the word form as its parameter. A robust-rule can accept the
word form via result.
- ``input_filter ($cat_list)'':
- An input-filter may
occur at most once in a syntax rule file. The input-filter is called after a
word form has been analysed. It gets one parameter, namely the list of the
analysis results, and it transforms it to one or more filtered results (via
result).
- ``output_filter ($cat_list)'':
- An output-filter
may occur at most once in any rule file.
-
In allo-rule files:
- The output-filter is called after all lexicon entry
have been processed by the allo-rules. The filter is called for every
allomorph surface. It gets one parameter, namely the list of the generated
categories with that surface, and it transforms it to one or more filtered allomorph
categories (via result).
- In combi-rule files:
- The output-filter is called after an item has
been analysed. It gets one parameter, namely the list of the analysis
results, and it transforms it to one or more filtered results (via result).
- ``subrule ($param1, $param2, ...)'':
- Any
number of subrules may occur in any rule file. A subrule can be invoked from
other rules and it must return a value to this rule via return. It can
have any number of parameters (at least one).
If a rule is executed, all statements in the rule are processed sequentially.
After that, the rule execution is terminated. Thereby, the if statement,
the foreach statement, and the parallel statement may change the
processing order. Special conditions apply if:
-
A condition in a test statement does not hold. In this case the
processing of the rule path is terminated. This is not an error.
- The fail statement was executed. This is a special case of case 1.
- An assert condition does not hold. In this case the processing of
the whole grammar is terminated and an error message is displayed. This rule
termination can be used to find categorisation or programming flaws in the
rule system or in the lexicon.
- The error statement was executed. This is a special case of
case 3.
- The return statement was executed in a subrule or in a pruning
rule. In a subrule, this terminates the subrule int the current rule path and
immediately returns to the calling rule. In a pruning rule, this terminates
the pruning rule.
6.11 Statements
$$ Statement ::= Assert-Statement | Assignment
$$ | Choose-Statement | Define-Statement
$$ | Error-Statement | Fail-Statement | Foreach-Statement
$$ | If-Statement | Parallel-Statement | Repeat-Statement
$$ | Require-Statement | Result-Statement | Return-Statement .
A rule body contains a sequence of statements.
The statements are the assignment and the statements beginning with
assert, choose, define, error,
fail, foreach, if, parallel, repeat,
require, result, and return.
6.11.1 The assert Statement
$$ Assert-Statement ::= ("assert" | "!") Condition ";" .
The statement
assert condition;
or
! condition;
tests whether condition holds. If this is not the case, an error
message with the line number in the source code is printed and the processing
of all paths is terminated.
The assert statement should be used to check whether there are structural
flaws in the lexicon or the rule system.
6.11.2 The Assignment
$$ Assignment ::= Variable {"." Value}
$$ (":=" | ":=+" | ":=-" | ":=*" | ":=/") Expression ";" .
To set the value of an already defined variable to a different value, use a
statement of the following form:
$var := expr;
The expression expr is evaluated and the result is assigned to the
variable $var. The variable must have already been defined.
You can optionally specify a path behind the variable that is to be set by an
assignment:
$var.part1.part2 := value;
In this case, only the value of ``$var.part1.part2'' will be set to value; the remainder of the variable
$var will be unchanged. Each part must be an expression that
evaluates to a symbol, a number or a list of symbols and numbers.
You can also use one of four other assignment operators instead of the operator
``:='': The statement ``$var :=+ value;'' is a
shorthand for ``$var := $var + value;'', the
analogon holds for the assignment operators ``:=-'', ``:=*'', and
``:=/''. Here, $var may be followed by a path again.
6.11.3 The choose Statement
$$ Choose-Statement ::= "choose" Variable "in" Expression ";" .
The choose statement chooses an element of a list. Its format
is:
choose $var in expr;
For every element in the list expr a rule path is created; in this rule
path the element is stored in the variable $var. Thus the number of
rule paths can multiply. If, for example, expr has the value <A,
B, C>, the currently processed rule path has three continuations: In the
first one $var has the value A, in the second one it has the
value B and in the third one it has the value C. The three paths
behave independently from now on; some may fail while others may be processed
successfully, and the results can be different.
The choose statement can also be used for records. In that case, the
variable $var gets a different attribute name of the record expr in each path.
The choose statement also works for numbers:
-
If expr is a positive number n, the variable $var is assigned the numbers 1, 2, ..., n,
respectively, in each path.
- If expr is a negative number -n, the variable $var is assigned the numbers -1, -2, ..., -n,
respectively, in each path.
6.11.4 The define Statement
$$ Define-Statement ::= "define" Variable ":=" Expression ";" .
A define statement is of the form
define $var := expr;
The expression expr is evaluated and the result is assigned to the
variable $var. The variable may not be defined before this statement;
it is defined by the statement and only exists until the statement sequence in
which the assignment is situated has been processed fully.
6.11.5 The error Statement
$$ Error-Statement ::= "error" String ";" .
The statement error terminates the execution of all paths and
prints out a given error message string and the line of the source text.
error message;
6.11.6 The fail Statement
$$ Fail-Statement ::= "fail" ";" .
The fail statement terminates the current rule path. Its format is:
fail;
6.11.7 The foreach Statement
$$ Foreach-Statement ::= "foreach" Variable "in" Expression ":" {Statement}
$$ "end" ["foreach"] ";" .
You may wish to manipulate all elements of a list or a record sequentially in one rule path. For this purpose, the foreach
statement was introduced. It has the following format:
foreach $var in expr: statements
end foreach;
Sequentially the first, second, third, ... element of the list expr
are assigned to $var and the statement sequence statements is
executed for each of those assignments.
Every time the statements are being walked through, the variable $var is defined again. Its scope is the block statements.
The foreach statement also works for records. In that case, the variable
$var is assigned the first, second, ... attribute name of the record
expr.
The foreach statement also works for numbers:
-
If expr is a positive number n, the variable $var is assigned the numbers 1, 2, ..., n
sequentially.
- If expr is a negative number n, the variable $var is assigned the numbers -1, -2, ..., -n
sequentially.
6.11.8 The if Statement
$$ If-Statement ::= "if" Condition "then" {Statement}
$$ {"elseif" Condition "then" {Statement}}
$$ "else" {Statement} "end" ["if"] ";" .
An if statement has the following form:
if |
condition1 |
then |
statements1 |
elseif |
condition2 |
then |
statements2 |
else |
|
|
statements3 |
end if ; |
The second line may be repeated unrestrictedly (including zero times), the
third line may be omitted.
Firstly, condition1 is evaluated. If it is satisfied, the
statement sequence statements1 is executed.
If the first condition is not satisfied, condition2 is evaluated; if
the result is true, statements2 is executed. This procedure is
repeated for every elseif part until a condition is satisfied.
If the if condition and elseif conditions fail, the statement
sequence statements3 is executed (if it exists).
After the if statement has been processed the next statement is executed.
The if after the end may be omitted.
6.11.9 The parallel Statement
$$ Parallel-Statement ::= "parallel" {Statement} {"and" {Statement}}
$$ "end" ["parallel"] ";" .
Using the parallel statement more than one continuation of an
analysis can be generated. Its format is:
parallel |
statements1 |
and |
statements2 |
and |
statements3 |
... |
end parallel; |
This creates as many rule paths as there are statement sequences. In the first
rule path, statements1 are executed, in the second one statements2
are executed, etc. Each rule path continues by executing the statements
following the parallel statement.
The keyword parallel behind the end can be omitted.
6.11.10 The repeat Statement
$$ While-Statement ::= "repeat" {Statement} "while" Condition ";" {Statement}
$$ "end" ["while"] ";"
You may wish to repeat a sequence of statements while a specific condition
holds. This can be realised by the repeat loop. It has the following form:
repeat
statements1
while condition ;
statements2
end while;
The statements statements1 are executed. Then, condition
is tested. If it holds, the statements2 are
executed and the repeat statement is executed again. If condition
does not hold, execution proceeds after the repeat statement.
6.11.11 The require Statement
$$ Require-Statement ::= ("require" | "?") Condition ";" .
A statement of the form
require condition;
or
? condition;
tests whether condition is true. If this is not the case the rule path
is terminated without error message. Test statements should be used to
decide whether a read word start (sentence start) is grammatical according to
the interpretation of the rule path.
6.11.12 The result Statement
$$ Result-Statement ::= "result" Expression ["," (Rule-Set | "accept")] ";" .
-
In combi rules:
- The statement
result expr, |
rules rule1, rule2, ...; |
specifies the Result category of the rule and the successor rules. The value
expr is the Result category. Behind the keyword rules the names
of all successor rules are enumerated. For every successor rule that is being
executed a new rule path will be created. The rule set may be enclosed in
parentheses.
If you want successor rules to be executed only if no other rule has been
successful, you can put their names behind the other rules' names and write an
else in front of them:
rules rule1, rule2
else rule3, rule4 else ...;
If none of the normal rules (here: rule1 and rule2) has been
successful, rule3 and rule4 are executed. If these rule also fail,
the next rules are executed, and so on. A rule has been successful if it has
executed at least one result statement.
- In combi-rules and end-rules:
-
If the input is to be accepted by the result statement (and therefore no successor rules are to be called) the following format has to be used:
result expr, accept;
If this statement is reached in a rule path, the input is accepted as
grammatically well-formed. The value expr is returned as the result of
the morphological or syntactic analysis.
- In filters and robust-rules:
- The format of a result statement
in a filter or robust-rule:
result expr;
If this statement is reached, the value expr is used as a result of the
executed rule.
- In allo rules:
- The format of the result statement in an allo rule
is:
result surface, category;
It creates an entry in the allomorph lexicon. The allomorph surface
surface must be a string; category is the categorical
information of the allomorph.
6.11.13 The return Statement
$$ Return-Statement ::= "return" Expression ";" .
In a subrule, the return statement is of the
following form:
return expr;
The value of expr is returned to the rule that invoked this subrule and
the subrule execution is finished.
In a pruning rule, the return statement is of the same form. Here, expr must be a list a list of yes- and no-symbols. Each state
in the category list, which is the pruning rule parameter, corresponds to a
symbol in the result list. If the symbol is yes, the corresponding state
is preserved. If the symbol is no, the state is abandoned.
A Malaga grammar system comprises several files: a symbol file, a lexicon file,
an allomorph rule file, a morphology rule file, an extended symbol file
(optional), and a syntax rule file (optional). The type of a file can be
seen by the ending of the file name. A grammar for the English language may
consist of the files ``english.sym'', ``english.lex'', ``english.all'', ``english.mor'' and ``english.syn''.
6.12.1 The Symbol File
$$ Symbol-File ::= {Symbol-Definition | Include} .
A symbol file has the suffix ``.sym''. It contains the symbol table.
6.12.2 The Extended Symbol File
$$ Extended-Symbol-File ::= Symbol-File .
An extended symbol file has the suffix ``.esym''. It contains an
additional symbol table that contains symbols that may only be used in the
syntax rule file.
6.12.3 The Lexicon File
$$ Lexicon-File ::= {Constant-Definition | Constant-Expression ";"} .
A lexicon file has the suffix ``.lex''. It consists of any number of
values and constant definitions, each terminated by a semicolon. Each value
stands for a lexical entry. A value may contain named constants and the
operators ``.'', ``+'', ``-'', ``*'', and ``/''. values, the lexical entries;
The format of the lexical entries is free, although it should be consistent
with the conception of the whole rule system.
6.12.4 The Allomorph Rule File
$$ Rule-File ::= {Rule | Constant-Definition | Initial | Include} .
$$ Allomorph-Rule-File ::= Rule-File .
The allomorph lexicon is generated from the base form lexicon by applying the
allo-rule on the base form entries. The allomorph generation rule file has
the suffix ``.all'' and consists of one allo-rule, an optional
output-filter, and any number of subrules and constant definitions.
For every lexical entry, the allo-rule is executed with the value of the
lexicon entry as parameter. The allo-rule can generate allomorphs using the
result statement.
After all allomorphs have been produced, the output-filter is executed once for
each surface in the (intermediate) allomorph lexicon. As parameter, the
output-filter gets the list of categories that share that surface. An entry in
the final allomorph lexicon is created everytime the result statement is
executed. The surface cannot be changed by the output-filter.
6.12.5 The Combi-Rule Files
$$ Combi-Rule-File ::= Rule-File .
A grammar system includes up to two combination rules files: one for
morphological combination with the suffix ``.mor'' and (optionally) one
for syntactic combination with the suffix ``.syn''.
A combination rule file consists of an initial state and any number of
combi-rules, subrules, and constant definitions. A syntax rule
file may contain one optional pruning-rule, one optional input-filter and one
optional output-filter; a morphology rule file may contain
one optional robust-rule and one optional output-filter.
Beginning with the rules listed up in the initial state, the rules and
their successors are processed until a result statement with the
keyword accept is encountered in every path. A path dies if there is no
more input (from the lexicon or from the morphology) that can be processed.
In morphology, if analysis has created no result and robust analysis has been
switched on, the robust-rule will be called with the analysis surface and can
create a result.
In syntax, when a new wordfom has been imported from morphology, the
input-filter can take a look at its categories and create new result
categories.
In syntax, if a pruning-rule is present and pruning has been activated, the
concatenation of the next word form is preceded by the following step: The
categories of all current LAG states are merged into a list, which is the
parameter of the pruning rule. The pruning-rule must execute a return
statement with a list of yes- and no-symbols. Each state in the
category list corresponds to a symbol in the result list. If the symbol is yes, the corresponding state is preserved. If the symbol is no, the
state is abandoned.
After analysis, the output-filter can take a look at all result categories and
create new result categories.