Malaga 4.3

Björn Beutel
Abteilung für Computerlinguistik
Universität Erlangen-Nürnberg, Germany

August 18th, 1999

Table of Contents

Chapter 1  Introduction

The Name ``Malaga'' has two different meanings: on the one hand, it is the name of a special purpose programming language, namely a language to implement grammars for natural languages. On the other hand, it is the name of a program package for development of Malaga Grammars and testing them by analysing words and sentences.

``Malaga'' is an acronym for ``Merely a Left-Associative-Grammar Application''. We will explain the formalism of Left Associative Grammars (LAG) later.

The program package ``Malaga'' has been developed by Björn Beutel in the ``Abteilung für Computerlinguistik der Universität Erlangen-Nürnberg'', Germany. There is a number of predecessors: The program packages LAMA, IMP, MAGIC, MOSAIC and LAP, all of them being developed at the same department. They are all based on LAG.

Gerald Schüller has implemented parts of the original debugger, the original Emacs Malaga mode and the original Tree and Variable output.

Meanwhile (1999) there exist morphology grammars for some real-world languages, for example for the German, Italian, English and Korean language.

If you have questions, criticism or suggestions for the improvement of Malaga, you can write an e-mail letter to malaga@linguistik.uni-erlangen.de or write to the following address:

Bjoern Beutel
Universitaet Erlangen-Nuernberg
Abteilung fuer Computerlinguistik
Bismarckstrasse 12
D-91054 Erlangen
Germany


Chapter 2  Left Associative Grammars

A formal grammar for a natural language can be used to check whether a sentence or a word form is grammatically well-formed (a word form is a special flectional form of a word, so ``book'' and ``books'' are two different word forms of the word ``book''). Furthermore, they can describe the structure and meaning of a sentence or a word form by a data structure that has been constructed in the analysis process.

The Left Associative Grammar (LAG) is such a kind of formal grammar. An LAG analyses a sentence (or a word form) step by step: its parts are concatenated from the left to the right, hence the name ``Left Associative Grammar''. A single LAG rule can only join two parts to a bigger one: it concatenates the Start part (which is the beginning of the sentence or word form that has already been analysed) and the Next part (which is the next word form or the next allomorph). Take a look at the following sentence:
Shakespeare liked writing comedies.
The sentence is being analysed by five rule applications:
``'' + ``Shakespeare''
``Shakespeare'' + ``liked''
``Shakespeare liked'' + ``writing''
``Shakespeare liked writing'' + ``comedies''
``Shakespeare liked writing comedies'' + ``.''
To apply a rule it's not sufficient to know the spelling of a word or an allomorph. A rule also requires morphological and syntactic information, such as word class, gender, meaning of a suffix and much more. This information associated with a part of speech (sentence, word form or allomorph) is called its category. The analysis of a sentence or a word returns such a category as result.

Now we'll take a closer look at how a sentence is analysed.
  1. Before we can start to analyse a sentence, the analysis automaton must be in an initial state. The initial state determines:
    1. the category of the empty sentence start, and
    2. the combination rule checking whether it is allowed to combine the empty sentence start with the first word form (which is yet to be read). This rule also determines the resulting category of the new sentence start (which consists of the old sentence start and the first word form concatenated).


  2. The next word form to be analysed is read and analysed morphologically. If there is no valid word form, the analysis process aborts.

  3. The category that morphology assigns to this word form is called the Next category. The category of the input that has been analysed syntactically so far is called the Start category.

  4. The active combination rule checks whether it is allowed to combine the sentence start (which may be empty), represented by the Start category, with the next word form, represented by the Next category. In a rule, categories can be compared by logical tests, and finally the category of the new sentence start (including the word form that has been read), the Result category, is constructed by the rule. The rule finally specifies which successor rule is active in the next step. Execution then continues at step 2.

    Instead of calling a successor rule a rule can also accept the analysed sentence. In this case the Result category of this rule is the category of the complete analysed sentence.
Morphological analysis operates analogously, except that a word form, composed from allomorphs, is being analysed. The next allomorph (step 2) is found in the allomorph lexicon.

This sketch is of course simplified. There can be ambiguities in an analysis, induced by several causes: These ambiguities are coped with by dividing the analysis into several subanalyses: if there are two lexicon entries for a word form, for example, the analysis continues using the first entry (and its category) as well as the second one. You can compare this with a branching path. The analyses will be continued independently of each other. So, one analysis can succeed while the other fails. Each analysis path can divide repeatedly, if another ambiguity is met. If several analysis paths are continued until they accept, the analysis process returns more than one result.

Chapter 3  The Malaga Programs

The Malaga programs are all started in a similar manner: either you give the name of a project file as argument (this is not possible if you start malrul or malsym), or you give the name of the files that are needed by the program (for malmake, you have to give the project file as argument). The file type is recognised by the file name ending.

Assume you've written a grammar that consists of a symbol file ``english.sym'', an allomorph rule file ``english.all'', a lexicon file ``english.lex'' and a morphology rule file ``english.mor'', and you have also written a project file ``english.pro''. Then you can start the program malaga by two ways (after you've compiled the grammar files):

malaga english.pro

or

malaga english.sym_c english.mor_c english.lex_c

If you use the first command line, the names of the grammar files will be read from the project file. The second command line contains the names of the compiled files explicitly. The order of the names is of no importance. The name of the allomorph rule file must not be included if you are starting malaga, since this file is not used by malaga itself, but it's needed by mallex to compile the lexicon file.

If you just want to know which version of a Malaga program you are using, you can get the version number by using the option ``-version'' or ``-v'':

malrul -version

The program only emits a few lines with information about its version number and its purpose.

3.1  Projects

A couple of files, taken together, form a Malaga grammar: You can group these files together to a project. To do this, you have to write a project file, with a name ending in ``pro'', in which you list the names of the several files, each one behind a keyword (each file type in a line on its own). Imagine you have written a grammar that consists of the files ``standard.sym'', ``webster.lex'', ``english.all'', ``english.mor'' and ``english.syn''. The project file for this grammar will look like this:
sym: standard.sym
lex: webster.lex
all: english.all
mor: english.mor
syn: english.syn
By using the include statement, you can include further source files in your source files, so a part of your grammar can consist of several files. Assume, you've got a lexicon file ``webster.lex'' that looks like this:
include "suffixes.lex";
include "verbs.lex";
include "adjectives.lex";
include "nouns.lex";
include "particles.lex";
include "abbreviations.lex";
include "names.lex";
include "numbers.lex";
In this case, you must write the names of all these files in the ``lex:'' line of your project file behind the name of the real lexicon file:
lex: webster.lex suffixes.lex verbs.lex adjectives.lex
lex: nouns.lex particles.lex abbreviations.lex names.lex numbers.lex
Since there is a number of files in this example, the ``lex:'' line has been divided into two lines, each line starting with ``lex:''.

If you want to extend an existing project (for example, you might want to add a syntax rule file to a morphology grammar), you can include the project file of the morphology grammar in the project file of your syntax grammar by using a line starting with ``include:'':
include: /projects/grammars/english/english.pro
syn: english_syntax.syn
The file entries in the project file of the morphology are treated as if they would replace the ``include:'' line.

The programs malaga and mallex can set options like hidden or robust from the project file, so you do not need to set these options each time you start malaga. Each line in the project file that starts with ``malaga:'' and ``mallex:'', resp., will be executed when malaga and mallex, resp., has been started, but you may only use the set command, so you can only set options. Here's an example:
  ...
malaga: set hidden +semantics
malaga: set robust on

mallex: set hidden +semantics +syntax
When you start malaga, the commands ``set hidden +semantics'' and ``set robust on'' will be executed; when you start mallex, the command ``set hidden +semantics +syntax'' will be executed.

Options in project files that are read in by ``include:'' lines in other project files will be executed as if they were at the position of the ``include:'' line.

Lines that start with ``morinfo:'' contain information about the morphology; lines that start with ``syninfo:'' contain information about the syntax. In malaga, you get this information if you use the command info. Example:
morinfo: =====================================
morinfo: Deutsche Malaga Morphologie 3.0
morinfo: written by Oliver Lorenz, 11.04.1997
morinfo: dmm@linguistik.uni-erlangen.de
morinfo: =====================================

3.2  The Malaga startup file ``.malagarc''

If you prefer some options that you want to use with every Malaga project, you may create your personal startup file in your home directory, called ``.malagarc''. You can enter malaga and mallex options in the same manner as you do in the project file:
malaga: set hidden +semantics
malaga: set robust on

mallex: set hidden +semantics +syntax
The options in the project file are used first, so you can override options in the project file by setting them in the startup file. In the startup file, you should set the display option if you want to use the graphical display program written in TCL/Tk.

You can set some attributes of the graphical user interface, namely the position, the size, and the font size of each window that is part of the user interface. Here is an example which sets every option available:
result_geometry: 628x480+640+0
result_font_size: 12

tree_geometry: 628x480+640+512
tree_font_size: 12

path_geometry: 628x480+640+0
path_font_size: 12

variables_geometry: 628x480+0+512
variables_font_size: 12
The geometry defines the size and/or position of each window. The first two numbers (``628x480'') define the width and the height of the window in pixels, the last two numbers (``+640+512'') define the position of its upper left corner. The available font sizes are 8, 10, 12, 14, and 18 pixels.

3.3  The Program ``malaga''

The program malaga is the user interface for analysing word forms and sentences, displaying the results and finding bugs in a grammar. You can start malaga giving either the name of a project file or the names of the grammar files as arguments:

malaga english.pro

or

malaga english.sym_c english.mor_c english.lex_c english.syn_c

If you are not using a project file, you have to give: When malaga has been started, it loads the symbol file, the lexicon file and the rule file(s). After loading, the prompt appears. Then malaga is ready to execute your commands:
malaga (4.3) - Copyright (C) 1995-1999 Bjoern Beutel
This program comes with ABSOLUTELY NO WARRANTY.
This is free software which you may redistribute under certain conditions.
For details, refer to the GNU General Public License.
malaga> 
You can now enter any malaga command. If you are not sure about the name of a command, use the command help to get an overview of all malaga commands.

If you want to quit malaga, enter the command quit.

You can use the following command line options when you start malaga:
``-morphology'' or ``-m'' starts malaga in morphology mode. That is, word forms are being read in from the standard input stream and analysed (one word form per line). The analysis result is being written to the standard output stream.
``-syntax'' or ``-s'' starts malaga in syntax mode. That is, sentences are being read in from the standard input stream and analysed (one sentence per line). The analysis result is being written to the standard output stream.

3.4  The Program ``mallex''

By using mallex, you can make the allomorph rules process the entries of a base form lexicon. A run time lexicon (with the ending ``.lex_c'') will be built. Normally, mallex starts in batch mode. If you want to run it interactively, you must give it the option ``-interactive'' or ``-i'' when starting (if you start it from Emacs with ``M-x mallex'', this will be done automatically).

You can start mallex either with the name of a project file or with the names of the needed grammar files:

mallex english.pro

or

mallex english.sym_c english.all_c english.lex

If you are not using a project file, you must give If you have started mallex by using the option ``-interactive'' or ``-i, mallex runs interactively: it loads the symbol file and the allomorph rule file. Then the prompt appears:
mallex (4.3) - Copyright (C) 1995-1999 Bjoern Beutel
This program comes with ABSOLUTELY NO WARRANTY.
This is free software which you may redistribute under certain conditions.
For details, refer to the GNU General Public License.
mallex> 
You can now enter any mallex command. If you do not remember the command names, you can use the command help to see an overview of the mallex commands.

If you want to quit mallex, enter the command quit.

If you've started mallex in batch mode, it creates the run time lexicon file from the base form lexicon file. If the lexicons are very big or the allomorph rules are very complex, this can take some minutes. After creation, mallex quits.

You can use the following command line options when you start mallex:
``-interactive'' or ``-i'' runs mallex in interactive mode.
``-readable'' or ``-r'' runs mallex in batch mode and outputs the allomorph lexicon in readable form on the standard output stream.

3.5  The Program ``malmake''

The program malmake reads a project file, it checks if all grammar files needed do exist, and it translates all grammar files that have not yet been translated or whose source files have changed since they have been translated. malmake itself calls the programs malsym, mallex and malrul if needed. An example: assume you have written a morphology grammar whose grammar files are bundled in a project file ``english.pro'':
sym: rules/english.sym

all: rules/english.all

lex: rules/english.lex lex/adjectives.lex
lex: lex/particles.lex lex/suffixes.lex lex/verbs.lex
lex: lex/nouns.lex lex/abbreviations.lex lex/numbers.lex

mor: rules/english.mor

mallex: set hidden +semantics +syntax

malaga: set hidden +semantics
When executing ``malmake dmm.pro'' for the first time, the symbol file, the rule files and the lexicon file will be translated:
compiling "dmm.sym"
compiling "dmm.all"
compiling "dmm.mor"
compiling "dmm.lex"
project is up to date
The translation of a big lexicon can take a long time, since the allomorph rules have to be executed for each lexicon entry.

3.6  The Program ``malrul''

The program malrul translates Malaga rule files, i.e. files that have the endings ``.all'', ``.mor'' or ``.syn''. The compiled file gets the name ``.all_c'', ``.mor_c'', or ``.syn_c''. Give the following arguments if you are starting malrul: The order of the arguments is arbitrary. Here is an example:

malrul english.mor english.sym_c

3.7  The Program ``malsym''

malsym can translate Malaga symbol files, i.e. files having the ending ``.sym'' or ``.esym''. The translated file gets the ending ``.sym_c'' or ``.esym_c''.

For example:

malsym english.sym

If you are translating an extended symbol file with the ending ``.esym'', enter the name of the compiled symbol file as an additional argument:

malsym english.esym english.sym_c

This argument is needed since extended symbol files are extensions of ordinary symbol files.

Chapter 4  The Commands of ``malaga'' and ``mallex''

Since the user interfaces of malaga and mallex are very similar and since they have a bunch of commands in common, we will describe them in a common chapter. Commands that can be used in malaga or in mallex only, are marked by the name of the program in which they can be used.

4.1  The Command ``break''

If you want to stop the rules at a specific point, for example to take a look at the variables, you can use the command break to set breakpoints. A breakpoint is a point in the rule source text where rule execution is interrupted, so you can enter commands in debug mode. Breakpoints are only active in debug mode, this means you have started rule execution by a debug command or you have continued rule execution by one of the commands step, next, walk, or go.

Behind the command name, break, you can give one of the following arguments:
a line number.
A breakpoint is set at this line in the current source file. If there is no statement starting at this line, the breakpoint will be set at the nearest line where a statement starts. You can, for example, set a breakpoint at line 245 in the current source file by entering the command

break 245

a file name and a line number.
A breakpoint is set at this line in this file. If there is no statement starting at this line, the breakpoint will be set at the nearest line where a statement starts. An example:

break english.syn 59

a rule name.
A breakpoint is set at the first statement in this rule. An example:

break final_rule
If the rule name or the file name is ambiguous, you can insert an abbreviation for the rule system you refer to. Put it in front of the rule name or the file name. The following abbreviations are used:
all
for allomorph rules,
mor
for morphology rules,
syn
for syntax rules,
If you omit any argument, the breakpoint is set on the current line in the current file (this is helpful in debug mode).

Every breakpoint gets a unique number once it has been set, so you can delete it later, when you do not need it any longer.

You can list the breakpoints using the command list and delete them using delete.

4.2  The Command ``clear-cache'' (malaga)

If you have changed your settings so that the wordform cache is no longer valid, you can clear the cache using clear-cache.

4.3  The Command ``debug-entry'' (mallex)

Use debug-entry to find errors in your allomorph rules. This command works like ga, but the allomorph generation will be stopped before the first statement of the first rule is executed:
mallex> debug-entry [surface: "john", class: name]
at rule "irregular_verb"
debug> 
The prompt ``debug>'' that appears instead of ``mallex>'' indicates that mallex is currently executing the allomorph rules but has been interrupted. Since this ability has been developed to support the debugging of Malaga rules, this mode is called debug mode.

When mallex comes to the start of a new rule in debug mode (as in the example above), the name of this rule is printed. When in debug mode, you can always get the name of the current rule using the command rule.

If you're running mallex from Emacs, another Emacs window will display the source file. An arrow is used to show to the statement that will be executed next.
  ...
allo_rule irregular_verb ($entry):

=>? $entry.class = verb;
  ...
In debug mode, you can, for example, get the variables that are currently defined (using variable or print), and you can execute statements (using step, next, walk, go, or run). If you want to quit the debug mode, just enter run. The remaining statements for generation will then be executed without interruption.

4.4  The Command ``debug-file'' (mallex)

Use the command debug-file to make the allomorph rules work on a lexicon file in debug mode. Assume you have written a lexicon file ``mini.lex'':
[surface: "m{a}n", class: noun];
[surface: "table", class: noun];
[surface: "wise", class: adjective];
To let the rules process this lexicon in debug mode, enter:

debug-file mini.lex

4.5  The Command ``debug-line'' (mallex)

Use the command debug-line to make the allomorph rules generate allomorphs for a single lexicon entry in debug mode. Assume you want to test the second line in the lexicon file ``mini.lex'':
[surface: "m{a}n", class: noun];
[surface: "table", class: noun];
[surface: "wise", class: adjective];
Enter the following line:

debug-line mini.lex 2

Then mallex stops in debug mode at the entry of the first allomorph rule that is being executed for the lexicon entry ``[surface: "table", class:noun];''.

If there is no lexicon entry at this line, the subsequent lexicon entry will be taken.

4.6  The Command ``debug-mor'' (malaga)

Use the command debug-mor to find errors in your morphology combination rules. This command analyses the rest of the command line morphologically and executes the morphology combination rules in debug mode. Debug mode is explained for the command debug.

4.7  The Command ``debug-node'' (malaga)

Use the command debug-node to execute the successor rules of a specific LAG state in debug mode. Previously, you must have already analysed a word or a sentence, respectively. Make malaga display the analysis tree by entering tree, move the mouse pointer to the state node you want to debug, and press the left mouse button. A window opens in which this state's category is shown. The window's title line contains the number of the state node. Use this number as argument for debug-node. The last analysis input will be analysed again, and analysis stops when reaching the first successor rule of the specified state and malaga switches to debug mode.

4.8  The Command ``debug-syn'' (malaga)

Use the command debug-syn to find errors in your syntax combination rules. This command analyses the rest of the command line syntactically and executes the syntax combination rules in debug mode. Debug mode is explained for the command debug.

4.9  The Command ``delete''

If you want to delete a breakpoint, use the command delete with the number of the breakpoints as argument.

Enter ``delete all'' to delete all breakpoints.

4.10  The Command ``ga'' (mallex)

Use the command ga (short for ``generate allomorphs'') to generate allomorphs. This is useful for testing allomorph generation from within mallex. When you enter the command, give a lexicon entry as argument. All allomorphs that are generated from this entry by the allomorph rules, are printed on screen. For example:
mallex> ga [surface: "john", class: name]
surf: "john", cat: [class: name, base_form: "abraham"]
If the rules create multiple allomorphs from an entry, they are displayed one after another.

4.11  The Command ``ga-file'' (mallex)

Use the command ga-file to make the allomorph rules generate allomorphs for a lexicon file. Assume you have written a lexicon file ``mini.lex'':
[surface: "m{a}n", class: noun];
[surface: "table", class: noun];
[surface: "wise", class: adjective];
To generate the allomorphs for this lexicon, enter:

ga-file mini.lex

This will produce a readable allomorph file whose name ends in ``.cat'' (for categories); for ``mini.lex'' its name will be ``mini.lex.cat'':
surf: "man", cat: [class: noun, syn: singular]
surf: "men", cat: [class: noun, syn: plural]
surf: "table", cat: [class: noun]
surf: "wise", cat: [class: adjective, restr: complete]
surf: "wis", cat: [class: adjective, restr: inflect]

4.12  The Command ``ga-line'' (mallex)

Use the command ga-line to make the allomorph rules generate allomorphs for a single lexicon entry. Assume you want to test the second line in the lexicon file ``mini.lex'':
[surface: "m{a}n", class: noun];
[surface: "table", class: noun];
[surface: "wise", class: adjective];
Enter the following line:

ga-line mini.lex 2

Then mallex generates allomorphs for the lexicon entry ``[surface: "table", class:noun];''.

If there is no lexicon entry at this line, the subsequent lexicon entry will be taken.

4.13  The Command ``get''

This command is used to query settings of malaga or mallex. Enter it together with the name of the option whose setting you want to know. The possible options are described in the next chapter. If you just enter ``get'', all settings will be shown.

4.14  The Command ``go''

This command can only be executed in debug mode. The rule execution will be resumed and continued until a breakpoint is met or the rules have been executed completely.

4.15  The Command ``help''

Use this command to get a list of the commands you can use. If you give the name of a command or an option as argument, a short explanation of this item will be printed. If a name represents a command as well as an option, prepend ``command'' or ``option'' to it.

4.16  The Command ``info'' (malaga)

This command gives you information about the morphology or syntax rules you are using.

4.17  The Command ``list''

If you enter the command list, all breakpoints are listed. For each breakpoint, its number, the name of the source file and the source line is shown.

4.18  The Command ``ma'' (malaga)

The command ma (for morphological analysis) starts a word form analysis. Give the word form that you want to be analysed as argument:
malaga> ma house
Malaga will show the results automatically, and it will also show the analysis tree automatically if you specified it using the tree option. You can look at the results using result or at the entire analysis tree using tree.

If you do not enter a word form behind the command ma, malaga re-analyses the last input.

4.19  The Command ``ma-file'' (malaga)

The command ma-file can be used to analyse files that contain word lists. A word list consists of a number of word forms, each word form on a line on its own. There may be empty lines in a word list. The following example is a word list called ``word-list'':
table
men's
blue
handicap
To analyse this word list, enter:

ma-file word-list result

This will produce a file ``result'' that contains the analysis results. If the second argument is missing, the result will be written to a file whose name ends in ``.cat'' (for categories); for ``word-list'', its name will be ``word-list.cat'':
1: "table": [class: noun, ...]
2: "men's": [class: noun, ...]
3: "blue": [class: noun, ...]
3: "blue": [class: adjective, ...]
3: "blue": [class: name, ...]
4: "handicap: unknown
The number at the line start represents the line number of the analysed original word form. The output format can be changed by using the commands output-format and unknown-format.

If a runtime error occurs during the analysis of a word, the error message will be inserted into the result file, and the next word will be processed.

After the analysis, some statistics will be printed: The number of analysed and recognised word forms, the average number of results per word form, and the average number of word forms that have been analysed per second (if the analysis took long enough).

4.20  The Command ``mg'' (malaga)

Use the command mg to generate all word forms that consist of a specified set of allomorphs. For example, the command

mg 3 un able believe

generates all word forms that consist of up to three allomorphs, where only the specified allomorphs (``un'', ``able'', and ``believe'') are used. The word forms are numbered from 1 onward, but different analyses of the same word form get the same index. The output will look like this:
1: "able"
2: "believe"
3: "unable"
4: "unbelieveable"
Please note that generation does not know of filters, pruning rules and default rules.

4.21  The Command ``next''

This command can only be executed in debug mode. The rule execution will be resumed and continues until a different source line is met or until the rules have been executed completely. It is like step, but subrules will be executed without interruption. If you specify a number as argument, the command will be repeated as often as specified.

4.22  The Command ``output''

This command prints the results of the last analysis or allomorph generation as ordinary text. The output format can be changed by using the commands allo-format (for mallex), output-format, and unknown-format (for malaga).

4.23  The Command ``print''

You can only use the command print in debug mode or if the previous analysis has stopped with an error in the combination rules. Using this command, you get the values of all Malaga variables currently defined. The variables will be printed in the order of their definitions:
malaga> sa-debug You are beautiful.
entering rule "Noun", start: "", next: "You"
debug> print
$sentence = [class: main_clause, parts: <>]
$word = [class: pronoun, result: S2]
You can specify any variable names (including the ``$'') as arguments to this command; you can even specify a path behind each of the variable names. In this case, only the values of the specified variables or paths are printed:
debug> print $word
$word = [class: pronoun, result: S2]
debug> print $word.class
$word.class = pronoun
If the variable values are very complex, the output of print can be confusing. Please use the command variables in this case.

4.24  The Command ``quit''

Use this command to leave malaga or mallex.

4.25  The Command ``result''

If you have previously analysed a word form or a sentence using ma or sa (in malaga), or you have generated allomorphs using ga or ga-line (in mallex), you can display the results with ``result''. The analysis results will be displayed in a window on their own which is called ``Results'' for malaga and ``Allomorph'' for mallex. They are numbered from 1 onward.

If you are executing the command result for the first time, or if you have closed a Results/Allomorph window that you'd opened before, a window will open, displaying the values of all results/allomorphs of the last analysis/generation.

If there is a Results/Allomorph window currently opened, the new results/allomorphs will be displayed in this window.

The Results/Allomorph window has a menu with some commands:
Window:
Here, two items can be selected:
Export Postscript...:
Choose this item to convert the display content to Postscript and save it as a file.
Close:
Choose this item to close the Results/Allomorph window.
Font size:
Choose one of the menu's subitems to change the font size.

4.26  The Command ``rule''

This command can only be used in debugger mode or after rule execution has been stopped by an error. It prints the name of the rule that has been executed; additionally, the Start and Next surface are printed in malaga. For example:
debug> rule
at rule "flexion", start: "hous", next: "es"

4.27  The Command ``run''

This command can only be used in debug mode. The rule execution will be resumed, and the rules will be executed completely without any interruption.

If you have invoked the debug mode by the command debug-node, rule execution will be stopped again when another Next item will be analysed.

4.28  The Command ``sa'' (malaga)

If you have started malaga with a syntax file in your command line or in the project file, you can start syntactic analyses using the command sa (short for syntactic analysis). Put the sentence you want to be analysed as argument behind the command name:
malaga> sa The man is in town.
Malaga will show the results automatically, and it will also show the analysis tree automatically if you specified it using the tree option. You can look at the results using result or at the entire analysis tree using tree.

If you do not enter a sentence behind the command sa, malaga re-analyses the last input.

4.29  The Command ``sa-file'' (malaga)

Using the command sa-file, you can analyse files that contain sentence lists. In a sentence list, each sentence stands in a line on its own; empty lines are permitted. Here is an example, a sentence list named ``sentence-list'':
He sleeps.
He slept.
He has slept.
He had slept.
To analyse this sentence list, enter:

sa-file sentence-list result

This will produce a file ``result'' that contains the analysis results. If the second argument is missing, the result will be written to a file whose name ends in ``.cat'' (for categories); for ``sentence-list'', its name will be ``sentence-list.cat''.
1: "He sleeps.": [functor: [syn: <S3>, sem: <"sleep">], 
                  arguments: <[syn: S3, sem: "definite pronoun"]>]
2: "He slept.": [functor: [syn: <S3>, sem: <"sleep">], 
                 arguments: <[syn: S3, sem: "definite pronoun"]>]
3: "He has slept.": [functor: [syn: <S3>, sem: <"have", "sleep">], 
                     arguments: <[syn: S3, sem: "definite pronoun"]>]
4: "He had slept.": [functor: [syn: <S3>, sem: <"have", "sleep">], 
                     arguments: <[syn: S3, sem: "definite pronoun"]>]
The number at the line start represents the line number of the analysed original sentence. The output format can be changed by using the commands output-format and unknown-format.

If a runtime error occurs during the analysis of a sentence, the error message will be inserted into the result file, and the next sentence will be processed.

After the analysis, some statistics will be printed: The number of analysed and recognised sentences, the average number of results per sentence, and the average number of sentences that have been analysed per second (if the analysis took long enough).

4.30  The Command ``set''

This command is used to change the settings of malaga or mallex. The command line ``set option argument'' changes option to argument.

If you want to get the current state of an option, use the command get. Options can also be set in the project file. The possible options are described in the next chapter.

4.31  The Command ``sg'' (malaga)

Use sg to generate sentences that are composed of a specified set of word forms. For example, if you enter

sg 3 . ? he she sleeps

all sentences that consist of up to three word forms, where only the specified word forms (``.'', ``?'', ``he'', ``she'', and ``sleeps'') are used. The sentences are numbered from 1 onward, but different analyses of the same sentence get the same index. The output looks like this:
malaga> sg 3 . ? he she sleeps
1: "he sleeps ."
2: "he sleeps ?"
3: "she sleeps ."
4: "she sleeps ?"
Please note that generation does not know of filters, pruning rules and default rules.

4.32  The Command ``step''

This command can only be executed in debug mode. The rule execution will be resumed and continues until a different source line is met or until the rules have been executed completely. If you specify a number as argument, the command will be repeated as often as specified.

4.33  The Command ``trace''

If you are executing your rules in debug mode or the rules were interrupted by an error, this command shows were rule execution currently stopped. If it stopped in a subrule, all calling rules are also shown.
debug> trace
line 23 in file "dmm-deutsch.syn", rule "fill_valencies"
line 391 in file "dmm-deutsch.syn", rule "main_clause_end"
This means, rule execution stopped in line 23 of ``dmm-deutsch.syn'', in rule ``fill_valencies''. This subrule was called from line 391 in ``dmm-deutsch.syn'', in rule ``main_clause_end''.

4.34  The Command ``transmit'' (malaga)

4.35  The Command ``tree'' (malaga)

If you've started a grammatical analysis using one of the commands ma or sa (or their debug variants), you can make malaga display the result by entering

tree

If the analysis has not yet finished (in debug mode or in case of an error), an intermediate result will be shown.

If you're executing the command tree for the first time, or if you've closed the Tree window before, a new tree window will open in which the current analysis tree will be displayed.

If there is already a Tree window open, the new analysis tree will be displayed in this window.

In the upper left corner of the Tree window, you will see the sentence or the word form that has been analysed. Below, the analysis tree is displayed. An analysis path always follows the edges from the left to the right.

A circle node stands for a LAG state, a two-circle node stands for an end state.

Above each edge, the Next surface that has been read in by the corresponding rule application is displayed. On the bottom of an edge, you'll see the name of the applied rule.

You can click on a node using the left mouse button. Then another window will open, namely the Path window. The Path window displays the surface, the category and the successor rules of the state you've clicked on. The node will be highlighted by a fatter border. If you've already clicked on a node, you can click on one of its successor nodes using the right mouse button. Then all rule applications, from the state clicked on previously up to the state clicked on this time, will be displayed in the Path windows. The corresponding path will be highlighted in the Tree window.

If you're clicking on a Next surface using the left mouse button, the surface and its category will be displayed in the Path window.

You can also click on rule names using the left mouse button. Then the corresponding rule application will be displayed in the Path window, i.e. the Start, Next and Result surface, the Start, Next and Result category, and the successor rules.

There are some commands that can be started from the Tree menu bar:
Window:
Here you can select from two menu items:
Export Postscript...:
Convert the displayed analysis tree to a Postscript file.
Close:
Close the Tree window.
Font size:
Select an item in this menu to adjust the font size.
View:
Specify which nodes of the analysis tree are actually displayed.
Result paths only:
Only the nodes that are part of a complete analysis are displayed.
All but dead ends:
All analysis states are displayed.
All nodes:
All analysis states are displayed, and also rectangular nodes for rule applications that did not succeed (dead ends).
Result:
Select an end state to display in the Path window.
First result:
Display the first end state.
Previous result:
If there is an end state displayed in the Path window, jump to the previous one.
Next result:
If there is an end state displayed in the Path window, jump to the next one.
Last result:
Display the last end state.
The Path windows has got an own menu bar which contains the menus Window, Font size and Result with the same menu items as the corresponding menus in the Tree window.

4.36  The Command ``variables''

Use this command if you want to examine the values of the currently defined variables. They will be displayed in window on their own. You do not need to give any arguments, but you can only execute this command if malaga is in debug mode or if the previous analysis has been stopped by an error in the rules.

If you are executing the command variables for the first time, or if you have closed a Variables window that you'd opened before, a window will open, displaying the values of all variables currently defined.

If there is a Variables window currently opened, the new variable contents will be displayed in this window.

The Variables window has a menu with some commands:
Window:
Here, two items can be selected:
Export Postscript...:
Choose this item to convert the variable display to Postscript and save it as a file.
Close:
Choose this item to close the Variables window.
Font size:
Choose one of the menu's subitems to change the font size.
Variables:
Show selected variables:
Choose one of the menu's subitems (variable names) to hide (or show) the corresponding variable.
Show all variables:
Choose this item to display all variables that are currently defined.
Show no variables:
Choose this item to suppress the display of all defined variables.

4.37  The Command ``walk''

This command works in debug mode only. The rule execution will be continued and stopped again as soon as a new rule is executed, a breakpoint is met or there are no more rules to execute.

Chapter 5  The Options of ``malaga'' and ``mallex''

The programs malaga and mallex share some of their options, so we describe them in a common chapter. Options can be set using the command set, and you can get the current value of an option using get. Options that can be used in malaga or in mallex only, are marked by the name of the program in which they can be used.

5.1  The Option ``alias''

With alias, you can define abbreviations for longer command lines. As arguments, give a name and an expansion, that is a command line which the name will stand for. If the expansion contains spaces, enclose it in double quotes. Omit the expansion if you want to delete an existing abbreviation.

If you type in the name of an alias at your command line, its expansion will be executed.

Aliases cannot be nested.

5.2  The Option ``allo-format'' (mallex)

With allo-format, you can change the output format for the generated allomorphs. Enter a format string as argument. If the format string contains spaces, enclose it in double quotes. If the argument is an empty string (""), no allomorphs will be shown.

In the format string, the following sequences have a special meaning:
``%c'':
will be replaced by the allomorph category.
``%n'':
will be replaced by the allomorph number.
``%s'':
will be replaced by the allomorph surface.

5.3  The Option ``cache-size'' (malaga)

Malaga has a cache for word forms. You can set the cache size, i.e. the maximum number of words in the cache, to n with ``set cache-size n''. If you set the cache size to 0, the cache is deactivated.

5.4  The Option ``display''

If you want to use any program that shows the Malaga trees, results or variables graphically, set the command line that starts this program via the display option. We recommend to set it in your .malagarc file.
set display "wish ~/malaga/tcl/display.tcl"

5.5  The Option ``hidden''

Some grammars can produce very large categories, so it can be useful not to show the values of some specified attributes. To achieve this, use the option hidden. You can give any number of arguments to this option. The following arguments are available:
``+attribute_name'': The specified attribute name will be put in parentheses if it occurs in a value; the attribute value will not be shown.
``-attribute_name'': The specified attribute will be shown completely again in the future.
``none'': All attributes will be shown completely again in the future.

5.6  The Option ``mor-out-filter'' (malaga)

Use the option mor-out-filter to switch the morphology output-filter on or off:
``set mor-out-filter yes'' activates the filter;
``set mor-out-filter no'' disactivates the filter.

5.7  The Option ``output''

In malaga, you can use the output option to execute the output command each time when you invoked an analysis by ma or sa. In mallex, you can use the output option to execute the output command each time when you invoked an allomorph generation by ga or ga-line. Set it in one of the following ways:
``set output on'': The output command will be executed after each analysis or generation.
``set output off'': The output command will not be executed automatically.

5.8  The Option ``output-format'' (malaga)

With output-format, you can change the output format for analysed items that have been recognised. Enter a format string as argument. If the format string contains spaces, enclose it in double quotes. If the argument is an empty string (""), no recognised forms will be shown.

In the format string, the following sequences have a special meaning:
``%c'':
will be replaced by the result category of the analysis.
``%l'':
will be replaced by the line number of the analysed form.
``%n'':
will be replaced by the number of analysis states for this form.
``%r'':
will be replaced by the reading index (the results for a form are indexed from 1 to the number of results).
``%s'':
will be replaced by the surface.

5.9  The Option ``pruning'' (malaga)

In your syntax rules, you may have specified a pruning rule that can prune the syntax analysis tree, i.e it can reduce the number of parallel paths. If you want this pruning rule to be executed, use the option pruning. Us one of the following arguments:
``set pruning on'' activates the pruning rule;
``set pruning off'' disactivates the pruning rule.

5.10  The Option ``result''

In malaga, you can use the result option to execute the result command each time when you invoked an analysis by ma or sa. In mallex, you can use the result option to execute the result command each time when you invoked an allomorph generation by ga or ga-line.

Set it in one of the following ways:
``set result on'': The result command will be executed after each analysis or generation.
``set result off'': The result command will not be executed automatically.

5.11  The Option ``robust'' (malaga)

With this command, you can specify if you want to run a robust-rule for the word forms that could not be recognised by LAG rules. The robust-rule gets the surface of an unknown word form as parameter and it can create one or more results by executing the result statement.
``set robust on'' enables this function;
``set robust off'' disables it.

5.12  The Option ``sort-records''

There are different ways to determine the order in which the attributes of a record are printed. With sort-records, you can choose between three order schemes:
``set sort-records internal'': The attributes will be printed in the order they have internally.
``set sort-records alphabetic'': The attributes will be ordered alphabetically by their names.
``set sort-records definition'': The attributes will be ordered by their names; the order is the same as in the symbol table.

5.13  The Option ``switch''

Malaga rules can query simple Malaga values (switches) that you can change during run time. Use the option switch to change the values:
``set switch name value'' sets the switch name, which must be a symbol, to value, which can be any Malaga value.

5.14  The Option ``syn-in-filter'' (malaga)

Use the option syn-in-filter to switch the syntax input-filter on or off:
``set syn-in-filter yes'' activates the filter;
``set syn-in-filter no'' disactivates the filter.

5.15  The Option ``syn-out-filter'' (malaga)

Use the option syn-out-filter to switch the syntax output-filter on or off:
``set syn-out-filter yes'' activates the filter;
``set syn-out-filter no'' disactivates the filter.

5.16  The Option ``transmit'' (malaga)

If you want to use the transmit function in malaga, you have to set a command line that starts the transmit process using the transmit option. Here is an example:
set transmit "my_transmit_program"

5.17  The Option ``tree'' (malaga)

You can use tree to make malaga execute the tree command each time when you invoked an analysis by ma or sa. Set it in one of the following ways:
``set tree on'': The tree command will be executed after each analysis.
``set tree off'': The tree command will not be executed automatically.

5.18  The Option ``unknown-format'' (malaga)

With unknown-format, you can change the output format for analysed items that have not been recognised. Enter a format string as argument. If the format string contains spaces, enclose it in double quotes. If the argument is an empty string (""), no unrecognised forms will be shown.

In the format string, the following sequences have a special meaning:
``%l'':
will be replaced by the line number of the analysed form.
``%n'':
will be replaced by the number of analysis states for this form.
``%s'':
will be replaced by the surface.

5.19  The Option ``variables''

When malaga or mallex stops in debug mode while executing a malaga rule, they can automatically show the defined variables at this point. Use the option variables to invoke this behaviour.
``set variables on'': The variables command will be executed each time when malaga or mallex stops in debug mode.
``set variables off'': The variables command will not be executed automatically.

Chapter 6  Definition of the Programming Language Malaga

6.1  Characterisation of Malaga

A malaga rule file resembles much in programming languages like Pascal or C (of course, those languages do not have a Left Associative Grammar formalism built in). A malaga source file must be translated before execution, this is the same as for compiler languages. But the generated Malaga code is not a machine code, but an intermediate code and has to be executed (interpreted) by an analysis program.

We may characterise Malaga as follows, as far as programming structures and data structures are concerned:
structured values:
The basic values in Malaga are symbols (names that can be used e.g. for categories or subcategories), numbers (floating point numbers), and strings. Values can be combined to ordered lists or records (also known as feature structures). A value in a list or a record can be a list or a record itself. An ``ambiguous'' symbol like ``singular_plural'' can be assigned a list of symbols like ``<singular, plural>''; such a symbol is called a multi symbol.

structured statements:
In Malaga, the concept of statement blocks is implemented in a similar way as it is in the programming language Pascal. There are structured control statements to select or repeat a statement sequence. A variable is always defined locally, i.e. it only exists from the point where it has been defined up to the end of the statement sequence in which it has been defined.

no type restrictions:
Any value can be assigned to a variable and the programmer can freely define the structure of values.

no side effects:
Malaga is, unlike programming languages like Pascal or C, free of side effects. If a variable gets a value, no other variable will be changed. Analysis paths are independent of each other.

termination:
A Malaga grammar that contains no recursive subrules and no repeat statements is guaranteed to terminate, i.e. it can never hang in a loop.

variables:
In a define statement, a variable is defined and gets an initial value. Use an assignment to set a variable that has already been defined to a new value.

operators:
Many generative grammar theories or linguistical programming languages use the concept of unification of feature structures. Malaga does not use unification, but it offers some operators to build lists or records (feature structures) explicitly. Since Malaga does without unification, analyses are much faster.

6.2  Malaga Source Texts

Source texts in Malaga are format-free; this means that between lexical symbols (strings, identifiers, keywords, numerals and symbols such as ``+'', ``~'' or ``:='') there may be blanks or newlines (whitespaces) or comments. Between two identifiers or two keywords there must be at least one whitespace to separate them syntactically.

In this documentation, the syntax of the source text components is defined formally in EBNF notation. The EBNF lines are printed in typewriter style and headed by ``$$''.

6.2.1  Comments

$$ Comment ::= "#" {printing_char} .
A comment may be inserted everywhere where a whitespace may be inserted. A comment begins with the symbol ``#''and extends to the end of the line. Comments are being ignored.

6.2.2  The include Statement

$$ Include ::= "include" String ";" .
A Malaga file may contain the statement
include "filename";
In a rule file, it can stand everywhere a rule can stand. In lexicon files, it can stand in place of a value; in symbol files, it can replace a symbol definition. The text of the included file is inserted verbatim at the very location where the include statement occurs. The file name has to be stated relatively to the directory of the file which contains the include statement.

6.2.3  Identifiers

$$ Identifier ::= (Letter | "_"  | "&") {Letter | Digit | "_" | "&"} .
In Malaga, names for variables, constants, symbols, and rules, and (see below for explanation) are called identifiers. An identifier may consist of uppercase and lowercase characters, the underscore ``_'', the ampersand ``&'', the vertical bar ``|'', and, from the second character on, also of digits. Uppercase and lowercase characters are not distinguished, i.e., Malaga is not case-sensitive. Malaga keywords must not be used as identifiers. A variable name must start with a ``$'', a constant name must start with a ``''. The same identifier may be used as variable name, constant name, symbol name, or rule name independently. Malaga can distinguish them by the context in which they occur.

Valid identifiers would be ``Noun'', ``noun'' (the same as the first), ``R2D2'', ``Vb_aux'', ``A|G|D'', ``_INF''. Identifiers like ``2Noun'', ``Verb.Frame'', ``OK?'', ``_ INF'' are not valid.

6.3  Values

Malaga expressions can have values with very complex structures. To describe how those values can be composed from simple values a few rules suffice. Simple values in Malaga are symbols, numbers, and strings, which can be composed to form records and lists.

6.3.1  Symbols

$$ Symbol ::= Identifier .
The central data type in Malaga is the symbol. It is used for describing syntactic or semantic properties of an allomorph, a word, or a sentence. A symbol is an identifier like ``Verb'', ``reflexive'', ``Sing_1''. The symbols ``nil'', ``yes'', ``no'', ``symbol'', ``string'', ``number'', ``list'', and ``record'' are predefined and have special meanings.

6.3.2  Numbers

$$ Number ::= [-] Digit {Digit} ["." Digit {Digit}] "E" Digit {Digit} .
A number in Malaga consists of an optional ``-'' sign, an integer part, an optional fractional part and an optional exponent of the form ``E[+|-]n''. There must be a dot between the integer part and the fractional part. Examples: ``0'', ``1'', ``1.0'', ``-13.75'', ``1.2E-5''.

6.3.3  Strings

$$ String ::= '"' {printing_char_except_double_quotes | '\"' | '\\'} '"' .
A string may consist of any number of characters (it may also be empty). It must be enclosed in double quotes and must not extend over more than one line. Within the double quotes there may be any combination of printable characters except the backslash ``\'' and the double quotes. These characters must be preceded by a ``\'' (escape character). Examples: "Hello", "He says: \"Great\"".

6.3.4  Lists

$$ List ::= "<" Expression {"," Expression} ">" .
A list is an ordered sequence of values. The values are separated by commas and enclosed in angle brackets:
<element1, element2, ...>
A list may as well be empty. The elements in a list may be arbitrarily complex; they may also be lists or records.

6.3.5  Records

$$ Record ::= "[" Symbol-Value-Pair {"," Symbol-Value-Pair} "]" .
$$ Symbol-Value-Pair ::= Expression ":" Expression .
A record is a collection of attributes. An attribute consists of a symbol, the attribute name, and an associated attribute value, which can by an arbitrary Malaga value. The attribute name serves as an access key for the attribute value, so all attributes in a record must have different names.

Records are noted down as follows:
[name1: value1, name2: value2, ...]
where name i denotes an attribute name and value i the associated attribute value. Example: ``[Class: Verb, Reg: Reg, Val: dirObj]''.

A record with no attributes, ``[]'', is called empty record.

6.4  Expressions

$$ Expression ::= ["-"] Term {("+" | "-") Term} .
$$ Term ::= Factor {("*" | "/") Factor} .
$$ Factor ::= Value {"." Value} .
$$ Value ::= Symbol | String | Number | List | Record | Constant 
$$           | Subrule-Invocation | Variable | "(" Condition ")" .
$$ Constant-Expression ::= Expression .
An expression is the form in which a value is used in Malaga. Values can be written as follows:
[Surf: "he", Class: Pron, Case&Number: S3]
Variables (these are placeholders for values within a rule) can as well be used as expressions:
$Pron
Furthermore, constants (placeholders for values in a rule file) can be used as expressions:
@combination_table
All three forms can be mixed:
[Surf: "he", Class: Pron, Case&Number: $result]
Furthermore, there are operators which modify values or combine two values to form a new value. Using those operators complex values can be composed. All operators work left-associatively and have a different priority (an operator with higher priority is applied before one with lower priority):
operator priority
. 3
*, / 2
+, - 1
The order in which the operators are to be applied can be changed by bracketing with round parentheses ``()''.

6.4.1  Variables

$$ Variable ::= "$" Identifier .
A variable is marked by a ``$'' preceding its name. The name may be any valid identifier. A variable is defined by the define statement; it receives a value and may from this point on be used in all expressions within the statement sequence. In such a statement sequence (and all subordinated statement sequences) a variable with the same name must not be defined again.

6.4.2  Constants

$$ Constant ::= "@" Identifier .
A constant is marked by a ``@'' preceding its name. The name may be any valid identifier. A constant is defined by a constant definition in a rule file, outside a rule. It is assigned a value and can be used in subsequent rules and constant definitions in that rule file.

6.4.3  Subrule Invokations

$$ Subrule-Invocation ::= Rule-Name "(" Expression {"," Expression} ")" .
$$ Rule-Name ::= Identifier .
A subrule is invoked when an expression ``subrule (value1, value2, ...)'' is evaluated. The expression yields the value that is returned by the return statement in the subrule. The number of parameters in a subrule invokation must match the number of parameters in the subrule definition.

There is a number of default subrules which are predefined. They are called functions and they all take one parameter only.

6.4.4  The Function ``atoms''

The expression ``atoms(symbol)'' yields the list of atomic symbols for symbol. If symbol is not a multi-symbol, it yields the list <symbol>.

6.4.5  The Function ``capital''

The expression ``capital (string)'' yields yes if the first character of the string string is a capital letter, else it yields no.

6.4.6  The Function ``length''

The expression ``length (list)'' yields the number of elements in ``list''.

6.4.7  The Function ``multi''

The expression ``multi(list)'', where list is a list of symbols, yields the multi symbol whose atomic list corresponds to list. If list contains a single atomic symbol, this symbol will be yield by the expression.

6.4.8  The Function ``set''

The expression ``set(list)'' yields a list which contains each element of list, but only once. That means, the list is converted to a set.

6.4.9  The Function ``switch''

The expression ``switch (symbol)'' yields the current value of the switch associated to ``symbol''. Use the option switch to change this value.

6.4.10  The Function ``symbol_name''

The expression ``symbol_name (symbol)'' yields the name of symbol as a string.

6.4.11  The Function ``transmit'' (malaga)

The expression ``transmit (value)'' writes value, converted to text format, to the transmit process via pipe and reads a value in text format from the transmit process via pipe. The answer is converted to the internal Malaga value format and returned as the result of the expression.

When this function is evaluated, the transmit process is started if it has not been started yet. The command line of the transmit process is specified by the option transmit.

6.4.12  The Function ``truncate''

The expression ``truncate (number)'' yields the largest integer number that is not greater than number.

6.4.13  The Function ``value_type''

The expression ``value_type (value)'' yields the type of value. The type information is coded as one of the symbols ``symbol'', ``string'', ``number'', ``list'', or ``record''.

6.4.14  The Operator ``.''

This operator may only be used in the following ways:

6.4.15  The Operator ``+''

This operator may only be used in the following ways:

6.4.16  The Operator ``-''

This operator may only be used in the following ways:

6.4.17  The Operator ``*''

This operator may only be used in the following ways:

6.4.18  The Operator ``/''

This operator may only be used in the following ways:

6.5  Conditions

$$ Condition ::= Comparison ({"and" Comparison} | {"or" Comparison}) .
$$ Comparison ::= ["not"] (Expression [Comparison-Operator Expression]
                  | Match-Comparison) .
$$ Comparison-Operator ::= "=" | "/=" | "~" | "/~" | "in" | "less" | "greater"
                           | "less_equal" | "greater_equal" .
A condition can either be true or false, as in ``Verb = Verb'' or ``Verb = Noun'', respectively. An expression that is evaluated to any of the symbols yes or no is a valid condition.

A condition can be used everywhere a (non-constant) value is needed. It will evaluate to yes or no. In this case, the condition must be surrounded by parentheses.

6.5.1  The Operators ``='' and ``/=''

The condition ``expr1 = expr2'' tests whether the expressions expr1 and expr2 are equal. There are several possibilities:
expr1 and expr2 are strings, symbols or numbers.
In this case expr1 and expr2 must be identical.
expr1 and expr2 are lists.
In this case expr1 and expr2 must match element by element.
expr1 and expr2 are records.
In this case expr1 and expr2 must contain the same attributes (though not necessarily in the same order) as in expr2.
For nested structures, equality is tested recursively.

If expr1 and expr2 do not have the same type, the test results in an error; only the symbol nil can be compared to any value.

The comparison ``expr1 /= expr2'' holds iff the comparison ``expr1 = expr2'' does not hold.

6.5.2  The Operators ``less'', ``less_equal'', ``greater'', ``greater_equal''

A condition of type ``expr1 operator expr2'' compares two numbers. Here, operator can have the following values:
operator meaning
less <
less_equal £
greater >
greater_equal ³
If either expr1 or expr2 is no number, an error will be reported.

6.5.3  The Operators ``~'' and ``/~''

For a comparison ``expr1 ~ expr2'', expr1 and expr2 must be lists or symbols.

If expr1 and expr2 are symbols, the list of their atomic symbols (atoms(expr1) and atoms(expr2) will be used for the comparison instead of the symbols themself.

The comparison test whether the lists do congruate, this means, whether they have an element in common.

The comparison ``expr1 /~ expr2'' holds iff the comparison ``expr1 ~ expr2'' does not hold.

6.5.4  The Operator ``in''

The operator ``in'' can be only used in the following ways:
  1. The condition ``symbol in record'' holds iff record contains an attribute named symbol.
  2. The condition ``value in list'' holds iff value is an element of list.

6.5.5  The matches Condition (Regular Expressions)

$$ Match-Comparison ::= Expression "matches" "(" Segment {"," Segment} ")".
$$ Segment ::= [Variable ":"] Constant-Expression .
The condition
expr matches (pattern)
interprets pattern as a pattern (a regular expression) and tests whether expr matches pattern. Patterns are defined as follows:
pattern ::= alternative { ``|'' alternative }

The string must be identical with one of the alternatives.

alternative ::= { atom [ ``*'' | ``?'' | ``+'' ] }

An alternative is a (possibly empty) sequence of atoms. An atom in a pattern corresponds to a character in a string. By using an optional postfix operator it is possible to specify for any atom how often it may be repeated within the string at that location: zero times or once, at least once (``+''), or arbitrarily often, including zero times (``*'').

atom ::= ``('' pattern ``)''

A pattern may be grouped by parentheses.

atom ::= ``['' [ ``^'' ] range { range } ``]''

A character class. It represents exactly one character from one of the ranges. If the symbol ``^'' is the first one in the class, the expression represents exactly one character that is not contained in one of the ranges.

atom ::= ``.''

Represents any character.

atom ::= character

Represents the character itself.

range ::= character1 [ ``-'' character2 ]

The range contains any character with a code at least as big as the code of character1 and not bigger than the code of character2. The code of character2 must be at least as big as the code of character1. If character2 is omitted, the range only contains character1.

character ::= Any character except ``*?+[]^-.\|()''

To use one of the characters ``*?+[]^-.\|()'', it must be preceded by a ``\'' (escape character).
You can divide the pattern into segments:
$surf matches ("un|in|im|ir|il", ".*", "(en)?")
is is the same as
$surf matches ("(un|in|im|ir|il).*(en)?").
A section of the string can be stored in a variable by prefixing the respective pattern with ``variable_name:'', as in
$surf matches ($a: "un|in|im|ir|il", ".*")
The variables defined by pattern matching are only defined in the statement sequence which is being executed if the pattern matching is successful. A matches condition that is may not have variable definitions in it.

6.6  The Operators not, and, and or

Conditions can be combined logically: The operator not takes exactly one argument. Complex conditions have to be put in parentheses ``( )''.

The operators and and or may not be mixed; otherwise the order of evaluation would be ambiguous. They have to be put in parentheses ``( )''.

6.7  The Symbol Table

$$ Symbol-Definition ::= Symbol [":=" "<" Symbol {"," Symbol} ">"] ";".
Every symbol used in a grammar has to be defined exactly once in the symbol table. Every symbol must be followed by a semicolon:
verb; noun; adjective;
Symbols that are being defined that way are called atomic symbols. A symbol can also be defined as a multi-symbol. Then the entry for this symbol has the following format:
symbol := list;
The list for this symbol must consist of at least two atomic symbols, all different from those that have already been defined. This list will be used by the operators ``~'' and ``/~'', ``atoms'', and ``multi''. The lists in the symbol table must be all different; they may not only differ in the order of their elements.

6.8  The Initial State

$$ Initial ::= "initial" Constant-Expression "," Rule-Set ";" .
$$ Rule-Set ::= "rules" (Rules {"else" Rules} | "(" Rules {"else" Rules} ")") .
$$ Rules ::= Rule-Name {"," Rule-Name} .
The initial state in a combination rule file is defined as follows:
initial value, rules rule1, rule2, ...;
The initial state specifies a category for the empty word start (or sentence start) in a combi rule file; the rules listed behind rules are applied in parallel to combine the empty word (sentence) start with the first allomorph (word form). The rules may be enclosed in parentheses.

If you want rules to be executed only if no other rule has been successful, you can put their names behind the other rules' names and write an else in front of them:
initial value rules rule1, rule2 else rule3, rule4 else ...;
If none of the normal rules rule1 and rule2 have been successful, rule3 and rule4 are executed. If these rules also fails, the next rules are executed, and so on.

6.9  The Constant Definition

$$ Constant-Definition ::= "define" Constant ":=" Constant-Expression ";" .
A constant definition is of the form
@constant := expr;
The constant expression expr will be evalued and the constant @constant will be defined to have this value. The constant must not be defined previously. The constant is valid from this definition up to the end of the rule file.

6.10  Rules

$$ Rule ::= Rule-Type Rule-Name "(" Variable {"," Variable} ")" ":"
$$          {Statement} "end" [Rule-Type] [Rule-Name] ";" .
$$ Rule-Type ::= "allo_rule" | "combi_rule" | "end_rule" | "pruning_rule"
$$               "robust_rule" | "input_filter" | "output_filter" | "subrule" .
A rule is a sequence of statements that is executed as a unit:
combi_rule name ($param1, $param2, ...):
    statement1
    statement2
    ...
end name;
A rule has to begin with one of the keywords allo_rule, combi_rule, end_rule, pruning_rule, robust_rule, input_filter, output_filter or subrule. It is followed by its parameter list, a list of variable names in parentheses. The variables will be assigned the parameter values when the rule is executed. The number of parameters depends on the rule type. The rule names have the following meanings:
``allo_rule ($lex_entry)'':
An allo-rule must occur exactly once in an allomorph rule file. It analyses a lexical entry and must generate one or more allomorph entries (via result). An allomorph rule has one parameter, namely the lexicon entry.
``combi_rule ($start, $next, $surf, $index)'':
Any number of combi-rules may occur in a combi-rule file. Before processing such a rule, the next segment (either the next allomorph or the next word form) is being read. The first parameter is the Start category, the second is the Next category, the third is the Next surface, and the fourth is the Next index. The third and the fourth parameter are optional. A combi-rule may state a successor rule set or accept the analysed input (both via result).
``pruning_rule ($list)'':
A pruning-rule may occur at most once in a syntax rule file. During syntax analysis, it can decide which states are still valid and which are to be deleted. The parameter is a list of categories of the states that have consumed the same input so far. The pruning-rule must execute a return statement with a list of yes- and no-symbols. Each state in $list corresponds to a symbol in the result list. If the symbol is yes, the corresponding state is preserved. If the symbol is no, the state is abandoned.
``robust_rule ($surface)'':
A robust-rule can only appear at most once a morphology rule file. If robust analysis has been switched on by the robust command, and a word form could not be recognised by the combi-rules, the robust-rule is executed with the surface of the word form as its parameter. A robust-rule can accept the word form via result.
``input_filter ($cat_list)'':
An input-filter may occur at most once in a syntax rule file. The input-filter is called after a word form has been analysed. It gets one parameter, namely the list of the analysis results, and it transforms it to one or more filtered results (via result).
``output_filter ($cat_list)'':
An output-filter may occur at most once in any rule file.
In allo-rule files:
The output-filter is called after all lexicon entry have been processed by the allo-rules. The filter is called for every allomorph surface. It gets one parameter, namely the list of the generated categories with that surface, and it transforms it to one or more filtered allomorph categories (via result).
In combi-rule files:
The output-filter is called after an item has been analysed. It gets one parameter, namely the list of the analysis results, and it transforms it to one or more filtered results (via result).
``subrule ($param1, $param2, ...)'':
Any number of subrules may occur in any rule file. A subrule can be invoked from other rules and it must return a value to this rule via return. It can have any number of parameters (at least one).
If a rule is executed, all statements in the rule are processed sequentially. After that, the rule execution is terminated. Thereby, the if statement, the foreach statement, and the parallel statement may change the processing order. Special conditions apply if:
  1. A condition in a test statement does not hold. In this case the processing of the rule path is terminated. This is not an error.
  2. The fail statement was executed. This is a special case of case 1.
  3. An assert condition does not hold. In this case the processing of the whole grammar is terminated and an error message is displayed. This rule termination can be used to find categorisation or programming flaws in the rule system or in the lexicon.
  4. The error statement was executed. This is a special case of case 3.
  5. The return statement was executed in a subrule or in a pruning rule. In a subrule, this terminates the subrule int the current rule path and immediately returns to the calling rule. In a pruning rule, this terminates the pruning rule.

6.11  Statements

$$ Statement ::= Assert-Statement | Assignment
$$               | Choose-Statement | Define-Statement
$$               | Error-Statement | Fail-Statement | Foreach-Statement 
$$               | If-Statement | Parallel-Statement | Repeat-Statement
$$               | Require-Statement | Result-Statement | Return-Statement .
A rule body contains a sequence of statements.

The statements are the assignment and the statements beginning with assert, choose, define, error, fail, foreach, if, parallel, repeat, require, result, and return.

6.11.1  The assert Statement

$$ Assert-Statement ::= ("assert" | "!") Condition ";" .
The statement
assert condition;
or
! condition;
tests whether condition holds. If this is not the case, an error message with the line number in the source code is printed and the processing of all paths is terminated.

The assert statement should be used to check whether there are structural flaws in the lexicon or the rule system.

6.11.2  The Assignment

$$ Assignment ::= Variable {"." Value} 
$$                (":=" | ":=+" | ":=-" | ":=*" | ":=/") Expression ";" .
To set the value of an already defined variable to a different value, use a statement of the following form:
$var := expr;
The expression expr is evaluated and the result is assigned to the variable $var. The variable must have already been defined.

You can optionally specify a path behind the variable that is to be set by an assignment:
$var.part1.part2 := value;
In this case, only the value of ``$var.part1.part2'' will be set to value; the remainder of the variable $var will be unchanged. Each part must be an expression that evaluates to a symbol, a number or a list of symbols and numbers.

You can also use one of four other assignment operators instead of the operator ``:='': The statement ``$var :=+ value;'' is a shorthand for ``$var := $var + value;'', the analogon holds for the assignment operators ``:=-'', ``:=*'', and ``:=/''. Here, $var may be followed by a path again.

6.11.3  The choose Statement

$$ Choose-Statement ::= "choose" Variable "in" Expression ";" .
The choose statement chooses an element of a list. Its format is:
choose $var in expr;
For every element in the list expr a rule path is created; in this rule path the element is stored in the variable $var. Thus the number of rule paths can multiply. If, for example, expr has the value <A, B, C>, the currently processed rule path has three continuations: In the first one $var has the value A, in the second one it has the value B and in the third one it has the value C. The three paths behave independently from now on; some may fail while others may be processed successfully, and the results can be different.

The choose statement can also be used for records. In that case, the variable $var gets a different attribute name of the record expr in each path.

The choose statement also works for numbers:

6.11.4  The define Statement

$$ Define-Statement ::= "define" Variable ":=" Expression ";" .
A define statement is of the form
define $var := expr;
The expression expr is evaluated and the result is assigned to the variable $var. The variable may not be defined before this statement; it is defined by the statement and only exists until the statement sequence in which the assignment is situated has been processed fully.

6.11.5  The error Statement

$$ Error-Statement ::= "error" String ";" .
The statement error terminates the execution of all paths and prints out a given error message string and the line of the source text.
error message;

6.11.6  The fail Statement

$$ Fail-Statement ::= "fail" ";" .
The fail statement terminates the current rule path. Its format is:
fail;

6.11.7  The foreach Statement

$$ Foreach-Statement ::= "foreach" Variable "in" Expression ":" {Statement}
$$                       "end" ["foreach"] ";" .
You may wish to manipulate all elements of a list or a record sequentially in one rule path. For this purpose, the foreach statement was introduced. It has the following format:
foreach $var in expr: statements end foreach;
Sequentially the first, second, third, ... element of the list expr are assigned to $var and the statement sequence statements is executed for each of those assignments.

Every time the statements are being walked through, the variable $var is defined again. Its scope is the block statements.

The foreach statement also works for records. In that case, the variable $var is assigned the first, second, ... attribute name of the record expr.

The foreach statement also works for numbers:

6.11.8  The if Statement

$$ If-Statement ::= "if" Condition "then" {Statement}
$$                  {"elseif" Condition "then" {Statement}}
$$                  "else" {Statement} "end" ["if"] ";" .
An if statement has the following form:
if condition1 then statements1
elseif condition2 then statements2
else     statements3
end if ;
The second line may be repeated unrestrictedly (including zero times), the third line may be omitted.

Firstly, condition1 is evaluated. If it is satisfied, the statement sequence statements1 is executed.

If the first condition is not satisfied, condition2 is evaluated; if the result is true, statements2 is executed. This procedure is repeated for every elseif part until a condition is satisfied.

If the if condition and elseif conditions fail, the statement sequence statements3 is executed (if it exists).

After the if statement has been processed the next statement is executed.

The if after the end may be omitted.

6.11.9  The parallel Statement

$$ Parallel-Statement ::= "parallel" {Statement} {"and" {Statement}}
$$                        "end" ["parallel"] ";" .
Using the parallel statement more than one continuation of an analysis can be generated. Its format is:
parallel statements1
and statements2
and statements3
...
end parallel;
This creates as many rule paths as there are statement sequences. In the first rule path, statements1 are executed, in the second one statements2 are executed, etc. Each rule path continues by executing the statements following the parallel statement.

The keyword parallel behind the end can be omitted.

6.11.10  The repeat Statement

$$ While-Statement ::= "repeat" {Statement} "while" Condition ";" {Statement}
$$                     "end" ["while"] ";"
You may wish to repeat a sequence of statements while a specific condition holds. This can be realised by the repeat loop. It has the following form:
repeat
statements1
while condition ;
statements2
end while;
The statements statements1 are executed. Then, condition is tested. If it holds, the statements2 are executed and the repeat statement is executed again. If condition does not hold, execution proceeds after the repeat statement.

6.11.11  The require Statement

$$ Require-Statement ::= ("require" | "?") Condition ";" .
A statement of the form
require condition;
or
? condition;
tests whether condition is true. If this is not the case the rule path is terminated without error message. Test statements should be used to decide whether a read word start (sentence start) is grammatical according to the interpretation of the rule path.

6.11.12  The result Statement

$$ Result-Statement ::= "result" Expression ["," (Rule-Set | "accept")] ";" .
In combi rules:
The statement
result expr,
rules rule1, rule2, ...;
specifies the Result category of the rule and the successor rules. The value expr is the Result category. Behind the keyword rules the names of all successor rules are enumerated. For every successor rule that is being executed a new rule path will be created. The rule set may be enclosed in parentheses.

If you want successor rules to be executed only if no other rule has been successful, you can put their names behind the other rules' names and write an else in front of them:
rules rule1, rule2 else rule3, rule4 else ...;
If none of the normal rules (here: rule1 and rule2) has been successful, rule3 and rule4 are executed. If these rule also fail, the next rules are executed, and so on. A rule has been successful if it has executed at least one result statement.

In combi-rules and end-rules:
If the input is to be accepted by the result statement (and therefore no successor rules are to be called) the following format has to be used:
result expr, accept;
If this statement is reached in a rule path, the input is accepted as grammatically well-formed. The value expr is returned as the result of the morphological or syntactic analysis.

In filters and robust-rules:
The format of a result statement in a filter or robust-rule:
result expr;
If this statement is reached, the value expr is used as a result of the executed rule.

In allo rules:
The format of the result statement in an allo rule is:
result surface, category;
It creates an entry in the allomorph lexicon. The allomorph surface surface must be a string; category is the categorical information of the allomorph.

6.11.13  The return Statement

$$ Return-Statement ::= "return" Expression ";" .
In a subrule, the return statement is of the following form:
return expr;
The value of expr is returned to the rule that invoked this subrule and the subrule execution is finished.

In a pruning rule, the return statement is of the same form. Here, expr must be a list a list of yes- and no-symbols. Each state in the category list, which is the pruning rule parameter, corresponds to a symbol in the result list. If the symbol is yes, the corresponding state is preserved. If the symbol is no, the state is abandoned.

6.12  Files

A Malaga grammar system comprises several files: a symbol file, a lexicon file, an allomorph rule file, a morphology rule file, an extended symbol file (optional), and a syntax rule file (optional). The type of a file can be seen by the ending of the file name. A grammar for the English language may consist of the files ``english.sym'', ``english.lex'', ``english.all'', ``english.mor'' and ``english.syn''.

6.12.1  The Symbol File

$$ Symbol-File ::= {Symbol-Definition | Include} .
A symbol file has the suffix ``.sym''. It contains the symbol table.

6.12.2  The Extended Symbol File

$$ Extended-Symbol-File ::= Symbol-File .
An extended symbol file has the suffix ``.esym''. It contains an additional symbol table that contains symbols that may only be used in the syntax rule file.

6.12.3  The Lexicon File

$$ Lexicon-File ::= {Constant-Definition | Constant-Expression ";"} .
A lexicon file has the suffix ``.lex''. It consists of any number of values and constant definitions, each terminated by a semicolon. Each value stands for a lexical entry. A value may contain named constants and the operators ``.'', ``+'', ``-'', ``*'', and ``/''. values, the lexical entries; The format of the lexical entries is free, although it should be consistent with the conception of the whole rule system.

6.12.4  The Allomorph Rule File

$$ Rule-File ::= {Rule | Constant-Definition | Initial | Include} .
$$ Allomorph-Rule-File ::= Rule-File .
The allomorph lexicon is generated from the base form lexicon by applying the allo-rule on the base form entries. The allomorph generation rule file has the suffix ``.all'' and consists of one allo-rule, an optional output-filter, and any number of subrules and constant definitions.

For every lexical entry, the allo-rule is executed with the value of the lexicon entry as parameter. The allo-rule can generate allomorphs using the result statement.

After all allomorphs have been produced, the output-filter is executed once for each surface in the (intermediate) allomorph lexicon. As parameter, the output-filter gets the list of categories that share that surface. An entry in the final allomorph lexicon is created everytime the result statement is executed. The surface cannot be changed by the output-filter.

6.12.5  The Combi-Rule Files

$$ Combi-Rule-File ::= Rule-File .
A grammar system includes up to two combination rules files: one for morphological combination with the suffix ``.mor'' and (optionally) one for syntactic combination with the suffix ``.syn''.

A combination rule file consists of an initial state and any number of combi-rules, subrules, and constant definitions. A syntax rule file may contain one optional pruning-rule, one optional input-filter and one optional output-filter; a morphology rule file may contain one optional robust-rule and one optional output-filter.

Beginning with the rules listed up in the initial state, the rules and their successors are processed until a result statement with the keyword accept is encountered in every path. A path dies if there is no more input (from the lexicon or from the morphology) that can be processed.

In morphology, if analysis has created no result and robust analysis has been switched on, the robust-rule will be called with the analysis surface and can create a result.

In syntax, when a new wordfom has been imported from morphology, the input-filter can take a look at its categories and create new result categories.

In syntax, if a pruning-rule is present and pruning has been activated, the concatenation of the next word form is preceded by the following step: The categories of all current LAG states are merged into a list, which is the parameter of the pruning rule. The pruning-rule must execute a return statement with a list of yes- and no-symbols. Each state in the category list corresponds to a symbol in the result list. If the symbol is yes, the corresponding state is preserved. If the symbol is no, the state is abandoned.

After analysis, the output-filter can take a look at all result categories and create new result categories.


This document was translated from LATEX by HEVEA.