Node: Projects, Next: , Previous: The Programs, Up: The Programs



3.1 Projects

A couple of files, taken together, form a Malaga grammar:

The lexicon file (.lex)
A lexicon of base forms.
The prelex file (.prelex, optional)
A precompiled lexicon in binary format.
The allomorph rule file (.all)
A file with rules which generate the allomorphs of the base forms.
The morphology rule file (.mor)
A file with rules which combine allomorphs to word forms.
The symbol file (.sym)
A file with the symbols that may be used in rules and feature structures.
The syntax rule file (.syn, optional)
A file with rules that combine word forms to sentences.
The extended symbol file (.esym, optional)
A file with additional symbols that may only be used in a syntax rule file.

You can group these files together to a project. To do this, you have to write a project file, with a name ending in .pro, in which you list the names of the several files, each one behind a keyword (each file type in a line on its own). Imagine you have written a grammar that consists of the files standard.sym, webster.lex, english.all, english.mor, and english.syn. The project file for this grammar will look like this:

     sym: standard.sym
     lex: webster.lex
     all: english.all
     mor: english.mor
     syn: english.syn
     

In your source files, you can include further source files by using the include statement; so a binary file of your grammar may be dependent on several source files. The program malmake uses the information in the project file to check for dependencies between source files and binaries, so the project file must contain the name of all source files for a specific binary. Relative path names are always relative to the directory of the project file.

Assume, you've got a lexicon file webster.lex that looks like this:

     include "suffixes.lex";
     include "verbs.lex";
     include "adjectives.lex";
     include "nouns.lex";
     include "particles.lex";
     include "abbreviations.lex";
     include "names.lex";
     include "numbers.lex";
     

In this case, you must write the names of all these files in the lex: line of your project file behind the name of the real lexicon file:

     lex: webster.lex suffixes.lex verbs.lex adjectives.lex
     lex: nouns.lex particles.lex abbreviations.lex names.lex numbers.lex
     

Since there is a number of files in this example, the lex: line has been divided into two lines, each line starting with lex:.

If you want to extend an existing project (for example, you might want to add a syntax rule file to a morphology grammar), you can include the project file of the morphology grammar in the project file of your syntax grammar by using a line starting with include::

     include: /projects/grammars/english/english.pro
     syn: english-syntax.syn
     

The file entries in the project file of the morphology are treated as if they would replace the include: line. Relative paths in the included file are relative to the included directory, not the including directory.

The programs malaga and mallex can set options like hidden or robust from the project file, so you do not need to set these options each time you start malaga. Each line in the project file that starts with malaga: and mallex:, respectively, will be executed when malaga and mallex, respectively, has been started, but you may only use the set command, so you can only set options in the project file. Here is an example:

       ...
     malaga: set hidden +semantics
     malaga: set robust-rule on
     mallex: set hidden +semantics +syntax
       ...
     

When you start malaga, the commands set hidden +semantics and set robust-rule on will be executed; when you start mallex, the command set hidden +semantics +syntax will be executed.

Options in project files that are read in by include: lines in other project files will be executed as if they were in place of the include: line.

Lines in project files that start with info: contain information about the grammar. In malaga, you get this information if you use the command info. Example:

     info: =====================================
     info: Deutsche Malaga Morphologie 3.0
     info: written by Oliver Lorenz, 11.04.1997
     info: =====================================
     

The malshow display program normally assumes that the character set is iso8859-1. If your grammar uses a different character set, insert the name of the character set into your project file:

     char-set: iso8859-5
     

The Korean writing system, Hangul, needs special treatment, because the characters it uses are syllables that must be split up into individual letters for morphological analysis. Such a conversion is built-in into malaga. To activate it, insert the following line into your project file:

     char-set: hangul