Previous Up

Chapter 6  Definition of the Programming Language Malaga

6.1  Characterisation of Malaga

A malaga rule file resembles much in programming languages like Pascal or C (of course, those languages do not have a Left Associative Grammar formalism built in). A malaga source file must be translated before execution, this is the same as for compiler languages. But the generated Malaga code is not a machine code, but an intermediate code and has to be executed (interpreted) by an analysis program.

We may characterise Malaga as follows, as far as programming structures and data structures are concerned:
structured values:
The basic values in Malaga are symbols (names that can be used e.g. for categories or subcategories), numbers (floating point numbers), and strings. Values can be combined to ordered lists or records (also known as feature structures). A value in a list or a record can be a list or a record itself. An ``ambiguous'' symbol like ``singular_plural'' can be assigned a list of symbols like ``<singular, plural>''; such a symbol is called a multi symbol.

structured statements:
In Malaga, the concept of statement blocks is implemented in a similar way as it is in the programming language Pascal. There are structured control statements to select or repeat a statement sequence. A variable is always defined locally, i.e. it only exists from the point where it has been defined up to the end of the statement sequence in which it has been defined.

no type restrictions:
Any value can be assigned to a variable and the programmer can freely define the structure of values.

no side effects:
Malaga is, unlike programming languages like Pascal or C, free of side effects. If a variable gets a value, no other variable will be changed. Analysis paths are independent of each other.

termination:
A Malaga grammar that contains no recursive subrules and no repeat statements is guaranteed to terminate, i.e. it can never hang in a loop.

variables:
In a define statement, a variable is defined and gets an initial value. Use an assignment to set a variable that has already been defined to a new value.

operators:
Many generative grammar theories or linguistical programming languages use the concept of unification of feature structures. Malaga does not use unification, but it offers some operators to build lists or records (feature structures) explicitly. Since Malaga does without unification, analyses are much faster.

6.2  Malaga Source Texts

Source texts in Malaga are format-free; this means that between lexical symbols (strings, identifiers, keywords, numerals and symbols such as ``+'', ``~'' or ``:='') there may be blanks or newlines (whitespaces) or comments. Between two identifiers or two keywords there must be at least one whitespace to separate them syntactically.

In this documentation, the syntax of the source text components is defined formally in EBNF notation. The EBNF lines are printed in typewriter style and headed by ``$$''.

6.2.1  Comments

$$ Comment ::= "#" {printing_char} .
A comment may be inserted everywhere where a whitespace may be inserted. A comment begins with the symbol ``#''and extends to the end of the line. Comments are being ignored.

6.2.2  The include Statement

$$ Include ::= "include" String ";" .
A Malaga file may contain the statement
include "filename";
In a rule file, it can stand everywhere a rule can stand. In lexicon files, it can stand in place of a value; in symbol files, it can replace a symbol definition. The text of the included file is inserted verbatim at the very location where the include statement occurs. The file name has to be stated relatively to the directory of the file which contains the include statement.

6.2.3  Identifiers

$$ Identifier ::= (Letter | "_"  | "&") {Letter | Digit | "_" | "&"} .
In Malaga, names for variables, constants, symbols, and rules, and (see below for explanation) are called identifiers. An identifier may consist of uppercase and lowercase characters, the underscore ``_'', the ampersand ``&'', the vertical bar ``|'', and, from the second character on, also of digits. Uppercase and lowercase characters are not distinguished, i.e., Malaga is not case-sensitive. Malaga keywords must not be used as identifiers. A variable name must start with a ``$'', a constant name must start with a ``''. The same identifier may be used as variable name, constant name, symbol name, or rule name independently. Malaga can distinguish them by the context in which they occur.

Valid identifiers would be ``Noun'', ``noun'' (the same as the first), ``R2D2'', ``Vb_aux'', ``A|G|D'', ``_INF''. Identifiers like ``2Noun'', ``Verb.Frame'', ``OK?'', ``_ INF'' are not valid.

6.3  Values

Malaga expressions can have values with very complex structures. To describe how those values can be composed from simple values a few rules suffice. Simple values in Malaga are symbols, numbers, and strings, which can be composed to form records and lists.

6.3.1  Symbols

$$ Symbol ::= Identifier .
The central data type in Malaga is the symbol. It is used for describing syntactic or semantic properties of an allomorph, a word, or a sentence. A symbol is an identifier like ``Verb'', ``reflexive'', ``Sing_1''. The symbols ``nil'', ``yes'', ``no'', ``symbol'', ``string'', ``number'', ``list'', and ``record'' are predefined and have special meanings.

6.3.2  Numbers

$$ Number ::= [-] Digit {Digit} ["." Digit {Digit}] "E" Digit {Digit} .
A number in Malaga consists of an optional ``-'' sign, an integer part, an optional fractional part and an optional exponent of the form ``E[+|-]n''. There must be a dot between the integer part and the fractional part. Examples: ``0'', ``1'', ``1.0'', ``-13.75'', ``1.2E-5''.

6.3.3  Strings

$$ String ::= '"' {printing_char_except_double_quotes | '\"' | '\\'} '"' .
A string may consist of any number of characters (it may also be empty). It must be enclosed in double quotes and must not extend over more than one line. Within the double quotes there may be any combination of printable characters except the backslash ``\'' and the double quotes. These characters must be preceded by a ``\'' (escape character). Examples: "Hello", "He says: \"Great\"".

6.3.4  Lists

$$ List ::= "<" Expression {"," Expression} ">" .
A list is an ordered sequence of values. The values are separated by commas and enclosed in angle brackets:
<element1, element2, ...>
A list may as well be empty. The elements in a list may be arbitrarily complex; they may also be lists or records.

6.3.5  Records

$$ Record ::= "[" Symbol-Value-Pair {"," Symbol-Value-Pair} "]" .
$$ Symbol-Value-Pair ::= Expression ":" Expression .
A record is a collection of attributes. An attribute consists of a symbol, the attribute name, and an associated attribute value, which can by an arbitrary Malaga value. The attribute name serves as an access key for the attribute value, so all attributes in a record must have different names.

Records are noted down as follows:
[name1: value1, name2: value2, ...]
where name i denotes an attribute name and value i the associated attribute value. Example: ``[Class: Verb, Reg: Reg, Val: dirObj]''.

A record with no attributes, ``[]'', is called empty record.

6.4  Expressions

$$ Expression ::= ["-"] Term {("+" | "-") Term} .
$$ Term ::= Factor {("*" | "/") Factor} .
$$ Factor ::= Value {"." Value} .
$$ Value ::= Symbol | String | Number | List | Record | Constant 
$$           | Subrule-Invocation | Variable | "(" Condition ")" .
$$ Constant-Expression ::= Expression .
An expression is the form in which a value is used in Malaga. Values can be written as follows:
[Surf: "he", Class: Pron, Case&Number: S3]
Variables (these are placeholders for values within a rule) can as well be used as expressions:
$Pron
Furthermore, constants (placeholders for values in a rule file) can be used as expressions:
@combination_table
All three forms can be mixed:
[Surf: "he", Class: Pron, Case&Number: $result]
Furthermore, there are operators which modify values or combine two values to form a new value. Using those operators complex values can be composed. All operators work left-associatively and have a different priority (an operator with higher priority is applied before one with lower priority):
operator priority
. 3
*, / 2
+, - 1
The order in which the operators are to be applied can be changed by bracketing with round parentheses ``()''.

6.4.1  Variables

$$ Variable ::= "$" Identifier .
A variable is marked by a ``$'' preceding its name. The name may be any valid identifier. A variable is defined by the define statement; it receives a value and may from this point on be used in all expressions within the statement sequence. In such a statement sequence (and all subordinated statement sequences) a variable with the same name must not be defined again.

6.4.2  Constants

$$ Constant ::= "@" Identifier .
A constant is marked by a ``@'' preceding its name. The name may be any valid identifier. A constant is defined by a constant definition in a rule file, outside a rule. It is assigned a value and can be used in subsequent rules and constant definitions in that rule file.

6.4.3  Subrule Invokations

$$ Subrule-Invocation ::= Rule-Name "(" Expression {"," Expression} ")" .
$$ Rule-Name ::= Identifier .
A subrule is invoked when an expression ``subrule (value1, value2, ...)'' is evaluated. The expression yields the value that is returned by the return statement in the subrule. The number of parameters in a subrule invokation must match the number of parameters in the subrule definition.

There is a number of default subrules which are predefined. They are called functions and they all take one parameter only.

6.4.4  The Function ``atoms''

The expression ``atoms(symbol)'' yields the list of atomic symbols for symbol. If symbol is not a multi-symbol, it yields the list <symbol>.

6.4.5  The Function ``capital''

The expression ``capital (string)'' yields yes if the first character of the string string is a capital letter, else it yields no.

6.4.6  The Function ``length''

The expression ``length (list)'' yields the number of elements in ``list''.

6.4.7  The Function ``multi''

The expression ``multi(list)'', where list is a list of symbols, yields the multi symbol whose atomic list corresponds to list. If list contains a single atomic symbol, this symbol will be yield by the expression.

6.4.8  The Function ``set''

The expression ``set(list)'' yields a list which contains each element of list, but only once. That means, the list is converted to a set.

6.4.9  The Function ``switch''

The expression ``switch (symbol)'' yields the current value of the switch associated to ``symbol''. Use the option switch to change this value.

6.4.10  The Function ``symbol_name''

The expression ``symbol_name (symbol)'' yields the name of symbol as a string.

6.4.11  The Function ``transmit'' (malaga)

The expression ``transmit (value)'' writes value, converted to text format, to the transmit process via pipe and reads a value in text format from the transmit process via pipe. The answer is converted to the internal Malaga value format and returned as the result of the expression.

When this function is evaluated, the transmit process is started if it has not been started yet. The command line of the transmit process is specified by the option transmit.

6.4.12  The Function ``truncate''

The expression ``truncate (number)'' yields the largest integer number that is not greater than number.

6.4.13  The Function ``value_type''

The expression ``value_type (value)'' yields the type of value. The type information is coded as one of the symbols ``symbol'', ``string'', ``number'', ``list'', or ``record''.

6.4.14  The Operator ``.''

This operator may only be used in the following ways:

6.4.15  The Operator ``+''

This operator may only be used in the following ways:

6.4.16  The Operator ``-''

This operator may only be used in the following ways:

6.4.17  The Operator ``*''

This operator may only be used in the following ways:

6.4.18  The Operator ``/''

This operator may only be used in the following ways:

6.5  Conditions

$$ Condition ::= Comparison ({"and" Comparison} | {"or" Comparison}) .
$$ Comparison ::= ["not"] (Expression [Comparison-Operator Expression]
                  | Match-Comparison) .
$$ Comparison-Operator ::= "=" | "/=" | "~" | "/~" | "in" | "less" | "greater"
                           | "less_equal" | "greater_equal" .
A condition can either be true or false, as in ``Verb = Verb'' or ``Verb = Noun'', respectively. An expression that is evaluated to any of the symbols yes or no is a valid condition.

A condition can be used everywhere a (non-constant) value is needed. It will evaluate to yes or no. In this case, the condition must be surrounded by parentheses.

6.5.1  The Operators ``='' and ``/=''

The condition ``expr1 = expr2'' tests whether the expressions expr1 and expr2 are equal. There are several possibilities:
expr1 and expr2 are strings, symbols or numbers.
In this case expr1 and expr2 must be identical.
expr1 and expr2 are lists.
In this case expr1 and expr2 must match element by element.
expr1 and expr2 are records.
In this case expr1 and expr2 must contain the same attributes (though not necessarily in the same order) as in expr2.
For nested structures, equality is tested recursively.

If expr1 and expr2 do not have the same type, the test results in an error; only the symbol nil can be compared to any value.

The comparison ``expr1 /= expr2'' holds iff the comparison ``expr1 = expr2'' does not hold.

6.5.2  The Operators ``less'', ``less_equal'', ``greater'', ``greater_equal''

A condition of type ``expr1 operator expr2'' compares two numbers. Here, operator can have the following values:
operator meaning
less <
less_equal £
greater >
greater_equal ³
If either expr1 or expr2 is no number, an error will be reported.

6.5.3  The Operators ``~'' and ``/~''

For a comparison ``expr1 ~ expr2'', expr1 and expr2 must be lists or symbols.

If expr1 and expr2 are symbols, the list of their atomic symbols (atoms(expr1) and atoms(expr2) will be used for the comparison instead of the symbols themself.

The comparison test whether the lists do congruate, this means, whether they have an element in common.

The comparison ``expr1 /~ expr2'' holds iff the comparison ``expr1 ~ expr2'' does not hold.

6.5.4  The Operator ``in''

The operator ``in'' can be only used in the following ways:
  1. The condition ``symbol in record'' holds iff record contains an attribute named symbol.
  2. The condition ``value in list'' holds iff value is an element of list.

6.5.5  The matches Condition (Regular Expressions)

$$ Match-Comparison ::= Expression "matches" "(" Segment {"," Segment} ")".
$$ Segment ::= [Variable ":"] Constant-Expression .
The condition
expr matches (pattern)
interprets pattern as a pattern (a regular expression) and tests whether expr matches pattern. Patterns are defined as follows:
pattern ::= alternative { ``|'' alternative }

The string must be identical with one of the alternatives.

alternative ::= { atom [ ``*'' | ``?'' | ``+'' ] }

An alternative is a (possibly empty) sequence of atoms. An atom in a pattern corresponds to a character in a string. By using an optional postfix operator it is possible to specify for any atom how often it may be repeated within the string at that location: zero times or once, at least once (``+''), or arbitrarily often, including zero times (``*'').

atom ::= ``('' pattern ``)''

A pattern may be grouped by parentheses.

atom ::= ``['' [ ``^'' ] range { range } ``]''

A character class. It represents exactly one character from one of the ranges. If the symbol ``^'' is the first one in the class, the expression represents exactly one character that is not contained in one of the ranges.

atom ::= ``.''

Represents any character.

atom ::= character

Represents the character itself.

range ::= character1 [ ``-'' character2 ]

The range contains any character with a code at least as big as the code of character1 and not bigger than the code of character2. The code of character2 must be at least as big as the code of character1. If character2 is omitted, the range only contains character1.

character ::= Any character except ``*?+[]^-.\|()''

To use one of the characters ``*?+[]^-.\|()'', it must be preceded by a ``\'' (escape character).
You can divide the pattern into segments:
$surf matches ("un|in|im|ir|il", ".*", "(en)?")
is is the same as
$surf matches ("(un|in|im|ir|il).*(en)?").
A section of the string can be stored in a variable by prefixing the respective pattern with ``variable_name:'', as in
$surf matches ($a: "un|in|im|ir|il", ".*")
The variables defined by pattern matching are only defined in the statement sequence which is being executed if the pattern matching is successful. A matches condition that is may not have variable definitions in it.

6.6  The Operators not, and, and or

Conditions can be combined logically: The operator not takes exactly one argument. Complex conditions have to be put in parentheses ``( )''.

The operators and and or may not be mixed; otherwise the order of evaluation would be ambiguous. They have to be put in parentheses ``( )''.

6.7  The Symbol Table

$$ Symbol-Definition ::= Symbol [":=" "<" Symbol {"," Symbol} ">"] ";".
Every symbol used in a grammar has to be defined exactly once in the symbol table. Every symbol must be followed by a semicolon:
verb; noun; adjective;
Symbols that are being defined that way are called atomic symbols. A symbol can also be defined as a multi-symbol. Then the entry for this symbol has the following format:
symbol := list;
The list for this symbol must consist of at least two atomic symbols, all different from those that have already been defined. This list will be used by the operators ``~'' and ``/~'', ``atoms'', and ``multi''. The lists in the symbol table must be all different; they may not only differ in the order of their elements.

6.8  The Initial State

$$ Initial ::= "initial" Constant-Expression "," Rule-Set ";" .
$$ Rule-Set ::= "rules" (Rules {"else" Rules} | "(" Rules {"else" Rules} ")") .
$$ Rules ::= Rule-Name {"," Rule-Name} .
The initial state in a combination rule file is defined as follows:
initial value, rules rule1, rule2, ...;
The initial state specifies a category for the empty word start (or sentence start) in a combi rule file; the rules listed behind rules are applied in parallel to combine the empty word (sentence) start with the first allomorph (word form). The rules may be enclosed in parentheses.

If you want rules to be executed only if no other rule has been successful, you can put their names behind the other rules' names and write an else in front of them:
initial value rules rule1, rule2 else rule3, rule4 else ...;
If none of the normal rules rule1 and rule2 have been successful, rule3 and rule4 are executed. If these rules also fails, the next rules are executed, and so on.

6.9  The Constant Definition

$$ Constant-Definition ::= "define" Constant ":=" Constant-Expression ";" .
A constant definition is of the form
@constant := expr;
The constant expression expr will be evalued and the constant @constant will be defined to have this value. The constant must not be defined previously. The constant is valid from this definition up to the end of the rule file.

6.10  Rules

$$ Rule ::= Rule-Type Rule-Name "(" Variable {"," Variable} ")" ":"
$$          {Statement} "end" [Rule-Type] [Rule-Name] ";" .
$$ Rule-Type ::= "allo_rule" | "combi_rule" | "end_rule" | "pruning_rule"
$$               "robust_rule" | "input_filter" | "output_filter" | "subrule" .
A rule is a sequence of statements that is executed as a unit:
combi_rule name ($param1, $param2, ...):
    statement1
    statement2
    ...
end name;
A rule has to begin with one of the keywords allo_rule, combi_rule, end_rule, pruning_rule, robust_rule, input_filter, output_filter or subrule. It is followed by its parameter list, a list of variable names in parentheses. The variables will be assigned the parameter values when the rule is executed. The number of parameters depends on the rule type. The rule names have the following meanings:
``allo_rule ($lex_entry)'':
An allo-rule must occur exactly once in an allomorph rule file. It analyses a lexical entry and must generate one or more allomorph entries (via result). An allomorph rule has one parameter, namely the lexicon entry.
``combi_rule ($start, $next, $surf, $index)'':
Any number of combi-rules may occur in a combi-rule file. Before processing such a rule, the next segment (either the next allomorph or the next word form) is being read. The first parameter is the Start category, the second is the Next category, the third is the Next surface, and the fourth is the Next index. The third and the fourth parameter are optional. A combi-rule may state a successor rule set or accept the analysed input (both via result).
``pruning_rule ($list)'':
A pruning-rule may occur at most once in a syntax rule file. During syntax analysis, it can decide which states are still valid and which are to be deleted. The parameter is a list of categories of the states that have consumed the same input so far. The pruning-rule must execute a return statement with a list of yes- and no-symbols. Each state in $list corresponds to a symbol in the result list. If the symbol is yes, the corresponding state is preserved. If the symbol is no, the state is abandoned.
``robust_rule ($surface)'':
A robust-rule can only appear at most once a morphology rule file. If robust analysis has been switched on by the robust command, and a word form could not be recognised by the combi-rules, the robust-rule is executed with the surface of the word form as its parameter. A robust-rule can accept the word form via result.
``input_filter ($cat_list)'':
An input-filter may occur at most once in a syntax rule file. The input-filter is called after a word form has been analysed. It gets one parameter, namely the list of the analysis results, and it transforms it to one or more filtered results (via result).
``output_filter ($cat_list)'':
An output-filter may occur at most once in any rule file.
In allo-rule files:
The output-filter is called after all lexicon entry have been processed by the allo-rules. The filter is called for every allomorph surface. It gets one parameter, namely the list of the generated categories with that surface, and it transforms it to one or more filtered allomorph categories (via result).
In combi-rule files:
The output-filter is called after an item has been analysed. It gets one parameter, namely the list of the analysis results, and it transforms it to one or more filtered results (via result).
``subrule ($param1, $param2, ...)'':
Any number of subrules may occur in any rule file. A subrule can be invoked from other rules and it must return a value to this rule via return. It can have any number of parameters (at least one).
If a rule is executed, all statements in the rule are processed sequentially. After that, the rule execution is terminated. Thereby, the if statement, the foreach statement, and the parallel statement may change the processing order. Special conditions apply if:
  1. A condition in a test statement does not hold. In this case the processing of the rule path is terminated. This is not an error.
  2. The fail statement was executed. This is a special case of case 1.
  3. An assert condition does not hold. In this case the processing of the whole grammar is terminated and an error message is displayed. This rule termination can be used to find categorisation or programming flaws in the rule system or in the lexicon.
  4. The error statement was executed. This is a special case of case 3.
  5. The return statement was executed in a subrule or in a pruning rule. In a subrule, this terminates the subrule int the current rule path and immediately returns to the calling rule. In a pruning rule, this terminates the pruning rule.

6.11  Statements

$$ Statement ::= Assert-Statement | Assignment
$$               | Choose-Statement | Define-Statement
$$               | Error-Statement | Fail-Statement | Foreach-Statement 
$$               | If-Statement | Parallel-Statement | Repeat-Statement
$$               | Require-Statement | Result-Statement | Return-Statement .
A rule body contains a sequence of statements.

The statements are the assignment and the statements beginning with assert, choose, define, error, fail, foreach, if, parallel, repeat, require, result, and return.

6.11.1  The assert Statement

$$ Assert-Statement ::= ("assert" | "!") Condition ";" .
The statement
assert condition;
or
! condition;
tests whether condition holds. If this is not the case, an error message with the line number in the source code is printed and the processing of all paths is terminated.

The assert statement should be used to check whether there are structural flaws in the lexicon or the rule system.

6.11.2  The Assignment

$$ Assignment ::= Variable {"." Value} 
$$                (":=" | ":=+" | ":=-" | ":=*" | ":=/") Expression ";" .
To set the value of an already defined variable to a different value, use a statement of the following form:
$var := expr;
The expression expr is evaluated and the result is assigned to the variable $var. The variable must have already been defined.

You can optionally specify a path behind the variable that is to be set by an assignment:
$var.part1.part2 := value;
In this case, only the value of ``$var.part1.part2'' will be set to value; the remainder of the variable $var will be unchanged. Each part must be an expression that evaluates to a symbol, a number or a list of symbols and numbers.

You can also use one of four other assignment operators instead of the operator ``:='': The statement ``$var :=+ value;'' is a shorthand for ``$var := $var + value;'', the analogon holds for the assignment operators ``:=-'', ``:=*'', and ``:=/''. Here, $var may be followed by a path again.

6.11.3  The choose Statement

$$ Choose-Statement ::= "choose" Variable "in" Expression ";" .
The choose statement chooses an element of a list. Its format is:
choose $var in expr;
For every element in the list expr a rule path is created; in this rule path the element is stored in the variable $var. Thus the number of rule paths can multiply. If, for example, expr has the value <A, B, C>, the currently processed rule path has three continuations: In the first one $var has the value A, in the second one it has the value B and in the third one it has the value C. The three paths behave independently from now on; some may fail while others may be processed successfully, and the results can be different.

The choose statement can also be used for records. In that case, the variable $var gets a different attribute name of the record expr in each path.

The choose statement also works for numbers:

6.11.4  The define Statement

$$ Define-Statement ::= "define" Variable ":=" Expression ";" .
A define statement is of the form
define $var := expr;
The expression expr is evaluated and the result is assigned to the variable $var. The variable may not be defined before this statement; it is defined by the statement and only exists until the statement sequence in which the assignment is situated has been processed fully.

6.11.5  The error Statement

$$ Error-Statement ::= "error" String ";" .
The statement error terminates the execution of all paths and prints out a given error message string and the line of the source text.
error message;

6.11.6  The fail Statement

$$ Fail-Statement ::= "fail" ";" .
The fail statement terminates the current rule path. Its format is:
fail;

6.11.7  The foreach Statement

$$ Foreach-Statement ::= "foreach" Variable "in" Expression ":" {Statement}
$$                       "end" ["foreach"] ";" .
You may wish to manipulate all elements of a list or a record sequentially in one rule path. For this purpose, the foreach statement was introduced. It has the following format:
foreach $var in expr: statements end foreach;
Sequentially the first, second, third, ... element of the list expr are assigned to $var and the statement sequence statements is executed for each of those assignments.

Every time the statements are being walked through, the variable $var is defined again. Its scope is the block statements.

The foreach statement also works for records. In that case, the variable $var is assigned the first, second, ... attribute name of the record expr.

The foreach statement also works for numbers:

6.11.8  The if Statement

$$ If-Statement ::= "if" Condition "then" {Statement}
$$                  {"elseif" Condition "then" {Statement}}
$$                  "else" {Statement} "end" ["if"] ";" .
An if statement has the following form:
if condition1 then statements1
elseif condition2 then statements2
else     statements3
end if ;
The second line may be repeated unrestrictedly (including zero times), the third line may be omitted.

Firstly, condition1 is evaluated. If it is satisfied, the statement sequence statements1 is executed.

If the first condition is not satisfied, condition2 is evaluated; if the result is true, statements2 is executed. This procedure is repeated for every elseif part until a condition is satisfied.

If the if condition and elseif conditions fail, the statement sequence statements3 is executed (if it exists).

After the if statement has been processed the next statement is executed.

The if after the end may be omitted.

6.11.9  The parallel Statement

$$ Parallel-Statement ::= "parallel" {Statement} {"and" {Statement}}
$$                        "end" ["parallel"] ";" .
Using the parallel statement more than one continuation of an analysis can be generated. Its format is:
parallel statements1
and statements2
and statements3
...
end parallel;
This creates as many rule paths as there are statement sequences. In the first rule path, statements1 are executed, in the second one statements2 are executed, etc. Each rule path continues by executing the statements following the parallel statement.

The keyword parallel behind the end can be omitted.

6.11.10  The repeat Statement

$$ While-Statement ::= "repeat" {Statement} "while" Condition ";" {Statement}
$$                     "end" ["while"] ";"
You may wish to repeat a sequence of statements while a specific condition holds. This can be realised by the repeat loop. It has the following form:
repeat
statements1
while condition ;
statements2
end while;
The statements statements1 are executed. Then, condition is tested. If it holds, the statements2 are executed and the repeat statement is executed again. If condition does not hold, execution proceeds after the repeat statement.

6.11.11  The require Statement

$$ Require-Statement ::= ("require" | "?") Condition ";" .
A statement of the form
require condition;
or
? condition;
tests whether condition is true. If this is not the case the rule path is terminated without error message. Test statements should be used to decide whether a read word start (sentence start) is grammatical according to the interpretation of the rule path.

6.11.12  The result Statement

$$ Result-Statement ::= "result" Expression ["," (Rule-Set | "accept")] ";" .
In combi rules:
The statement
result expr,
rules rule1, rule2, ...;
specifies the Result category of the rule and the successor rules. The value expr is the Result category. Behind the keyword rules the names of all successor rules are enumerated. For every successor rule that is being executed a new rule path will be created. The rule set may be enclosed in parentheses.

If you want successor rules to be executed only if no other rule has been successful, you can put their names behind the other rules' names and write an else in front of them:
rules rule1, rule2 else rule3, rule4 else ...;
If none of the normal rules (here: rule1 and rule2) has been successful, rule3 and rule4 are executed. If these rule also fail, the next rules are executed, and so on. A rule has been successful if it has executed at least one result statement.

In combi-rules and end-rules:
If the input is to be accepted by the result statement (and therefore no successor rules are to be called) the following format has to be used:
result expr, accept;
If this statement is reached in a rule path, the input is accepted as grammatically well-formed. The value expr is returned as the result of the morphological or syntactic analysis.

In filters and robust-rules:
The format of a result statement in a filter or robust-rule:
result expr;
If this statement is reached, the value expr is used as a result of the executed rule.

In allo rules:
The format of the result statement in an allo rule is:
result surface, category;
It creates an entry in the allomorph lexicon. The allomorph surface surface must be a string; category is the categorical information of the allomorph.

6.11.13  The return Statement

$$ Return-Statement ::= "return" Expression ";" .
In a subrule, the return statement is of the following form:
return expr;
The value of expr is returned to the rule that invoked this subrule and the subrule execution is finished.

In a pruning rule, the return statement is of the same form. Here, expr must be a list a list of yes- and no-symbols. Each state in the category list, which is the pruning rule parameter, corresponds to a symbol in the result list. If the symbol is yes, the corresponding state is preserved. If the symbol is no, the state is abandoned.

6.12  Files

A Malaga grammar system comprises several files: a symbol file, a lexicon file, an allomorph rule file, a morphology rule file, an extended symbol file (optional), and a syntax rule file (optional). The type of a file can be seen by the ending of the file name. A grammar for the English language may consist of the files ``english.sym'', ``english.lex'', ``english.all'', ``english.mor'' and ``english.syn''.

6.12.1  The Symbol File

$$ Symbol-File ::= {Symbol-Definition | Include} .
A symbol file has the suffix ``.sym''. It contains the symbol table.

6.12.2  The Extended Symbol File

$$ Extended-Symbol-File ::= Symbol-File .
An extended symbol file has the suffix ``.esym''. It contains an additional symbol table that contains symbols that may only be used in the syntax rule file.

6.12.3  The Lexicon File

$$ Lexicon-File ::= {Constant-Definition | Constant-Expression ";"} .
A lexicon file has the suffix ``.lex''. It consists of any number of values and constant definitions, each terminated by a semicolon. Each value stands for a lexical entry. A value may contain named constants and the operators ``.'', ``+'', ``-'', ``*'', and ``/''. values, the lexical entries; The format of the lexical entries is free, although it should be consistent with the conception of the whole rule system.

6.12.4  The Allomorph Rule File

$$ Rule-File ::= {Rule | Constant-Definition | Initial | Include} .
$$ Allomorph-Rule-File ::= Rule-File .
The allomorph lexicon is generated from the base form lexicon by applying the allo-rule on the base form entries. The allomorph generation rule file has the suffix ``.all'' and consists of one allo-rule, an optional output-filter, and any number of subrules and constant definitions.

For every lexical entry, the allo-rule is executed with the value of the lexicon entry as parameter. The allo-rule can generate allomorphs using the result statement.

After all allomorphs have been produced, the output-filter is executed once for each surface in the (intermediate) allomorph lexicon. As parameter, the output-filter gets the list of categories that share that surface. An entry in the final allomorph lexicon is created everytime the result statement is executed. The surface cannot be changed by the output-filter.

6.12.5  The Combi-Rule Files

$$ Combi-Rule-File ::= Rule-File .
A grammar system includes up to two combination rules files: one for morphological combination with the suffix ``.mor'' and (optionally) one for syntactic combination with the suffix ``.syn''.

A combination rule file consists of an initial state and any number of combi-rules, subrules, and constant definitions. A syntax rule file may contain one optional pruning-rule, one optional input-filter and one optional output-filter; a morphology rule file may contain one optional robust-rule and one optional output-filter.

Beginning with the rules listed up in the initial state, the rules and their successors are processed until a result statement with the keyword accept is encountered in every path. A path dies if there is no more input (from the lexicon or from the morphology) that can be processed.

In morphology, if analysis has created no result and robust analysis has been switched on, the robust-rule will be called with the analysis surface and can create a result.

In syntax, when a new wordfom has been imported from morphology, the input-filter can take a look at its categories and create new result categories.

In syntax, if a pruning-rule is present and pruning has been activated, the concatenation of the next word form is preceded by the following step: The categories of all current LAG states are merged into a list, which is the parameter of the pruning rule. The pruning-rule must execute a return statement with a list of yes- and no-symbols. Each state in the category list corresponds to a symbol in the result list. If the symbol is yes, the corresponding state is preserved. If the symbol is no, the state is abandoned.

After analysis, the output-filter can take a look at all result categories and create new result categories.


Previous Up