Chapter 3 The Malaga Programs
The Malaga programs are all started in a similar manner: either you give the
name of a project file as argument (this is not possible if you start
malrul or malsym), or you give the name of the files that are
needed by the program (for malmake, you have to give the project file as
argument). The file type is recognised by the file name ending.
Assume you've written a grammar that consists of a symbol file ``english.sym'',
an allomorph rule file ``english.all'', a lexicon file ``english.lex'' and a
morphology rule file ``english.mor'', and you have also written a project file
``english.pro''. Then you can start the program malaga by two ways (after
you've compiled the grammar files):
malaga english.pro
or
malaga english.sym_c english.mor_c english.lex_c
If you use the first command line, the names of the grammar files will be read
from the project file. The second command line contains the names of the
compiled files explicitly. The order of the names is of no importance. The name
of the allomorph rule file must not be included if you are starting malaga, since this file is not used by malaga itself, but it's needed
by mallex to compile the lexicon file.
If you just want to know which version of a Malaga program you are using, you
can get the version number by using the option ``-version'' or ``-v'':
malrul -version
The program only emits a few lines with information about its version number
and its purpose.
3.1 Projects
A couple of files, taken together, form a Malaga grammar:
- a lexicon of base forms (the lexicon file, ending in ``.lex''),
- a file with rules which generate the allomorphs of the base forms (the
allomorph rule file, ending in ``.all''),
- a file with LAG rules which combine allomorphs to word forms (the morphology rule file, ending in ``.mor''),
- (optionally) a file with LAG rules that combine word forms to sentences
(the syntax rule file, ending in ``.syn''),
- a file with the used category symbols (the symbol file, whose
name ends in ``.sym''), and
- (optional) a file with additional category symbols that may only be used
in a syntax rule file (the extended symbol file, whose name ends in
``.esym'').
You can group these files together to a project. To do this, you have
to write a project file, with a name ending in ``pro'', in which you list
the names of the several files, each one behind a keyword (each file type in a
line on its own). Imagine you have written a grammar that consists of the files
``standard.sym'', ``webster.lex'', ``english.all'', ``english.mor'' and
``english.syn''. The project file for this grammar will look like this:
sym: standard.sym
lex: webster.lex
all: english.all
mor: english.mor
syn: english.syn
By using the include statement, you can include further source
files in your source files, so a part of your grammar can consist of
several files. Assume, you've got a lexicon file ``webster.lex'' that
looks like this:
include "suffixes.lex";
include "verbs.lex";
include "adjectives.lex";
include "nouns.lex";
include "particles.lex";
include "abbreviations.lex";
include "names.lex";
include "numbers.lex";
In this case, you must write the names of all these files in the ``lex:''
line of your project file behind the name of the real lexicon file:
lex: webster.lex suffixes.lex verbs.lex adjectives.lex
lex: nouns.lex particles.lex abbreviations.lex names.lex numbers.lex
Since there is a number of files in this example, the ``lex:'' line has
been divided into two lines, each line starting with ``lex:''.
If you want to extend an existing project (for example, you might want to add a
syntax rule file to a morphology grammar), you can include the project file of
the morphology grammar in the project file of your syntax grammar by using a
line starting with ``include:'':
include: /projects/grammars/english/english.pro
syn: english_syntax.syn
The file entries in the project file of the morphology are treated as if they
would replace the ``include:'' line.
The programs malaga and mallex can set options like hidden or
robust from the project file, so you do not need to set these options each
time you start malaga. Each line in the project file that starts with
``malaga:'' and ``mallex:'', resp., will be executed when malaga and mallex, resp., has been started, but you may only use the
set command, so you can only set options. Here's an example:
...
malaga: set hidden +semantics
malaga: set robust on
mallex: set hidden +semantics +syntax
When you start malaga, the commands ``set hidden +semantics'' and
``set robust on'' will be executed; when you start mallex, the
command ``set hidden +semantics +syntax'' will be executed.
Options in project files that are read in by ``include:'' lines in other
project files will be executed as if they were at the position of the ``include:'' line.
Lines that start with ``morinfo:'' contain information about the
morphology; lines that start with ``syninfo:'' contain information about
the syntax. In malaga, you get this information if you use the command
info. Example:
morinfo: =====================================
morinfo: Deutsche Malaga Morphologie 3.0
morinfo: written by Oliver Lorenz, 11.04.1997
morinfo: dmm@linguistik.uni-erlangen.de
morinfo: =====================================
3.2 The Malaga startup file ``.malagarc''
If you prefer some options that you want to use with every Malaga project, you
may create your personal startup file in your home directory, called
``.malagarc''. You can enter malaga and mallex options in the
same manner as you do in the project file:
malaga: set hidden +semantics
malaga: set robust on
mallex: set hidden +semantics +syntax
The options in the project file are used first, so you can override options in
the project file by setting them in the startup file. In the startup file, you
should set the display option if you want to use the graphical display
program written in TCL/Tk.
You can set some attributes of the graphical user interface, namely the
position, the size, and the font size of each window that is part of the user
interface. Here is an example which sets every option available:
result_geometry: 628x480+640+0
result_font_size: 12
tree_geometry: 628x480+640+512
tree_font_size: 12
path_geometry: 628x480+640+0
path_font_size: 12
variables_geometry: 628x480+0+512
variables_font_size: 12
The geometry defines the size and/or position of each window. The first two
numbers (``628x480'') define the width and the height of the window in
pixels, the last two numbers (``+640+512'') define the position of its
upper left corner. The available font sizes are 8, 10, 12, 14, and 18 pixels.
3.3 The Program ``malaga''
The program malaga is the user interface for analysing word forms and
sentences, displaying the results and finding bugs in a grammar. You can start
malaga giving either the name of a project file or the names of the
grammar files as arguments:
malaga english.pro
or
malaga english.sym_c english.mor_c english.lex_c english.syn_c
If you are not using a project file, you have to give:
-
the symbol file,
- the lexicon file,
- the morphology rule file, and
- the syntax rule file (optional).
When malaga has been started, it loads the symbol file, the lexicon file
and the rule file(s). After loading, the prompt appears. Then malaga is ready to execute your commands:
malaga (4.3) - Copyright (C) 1995-1999 Bjoern Beutel
This program comes with ABSOLUTELY NO WARRANTY.
This is free software which you may redistribute under certain conditions.
For details, refer to the GNU General Public License.
malaga>
You can now enter any malaga command. If you are not sure about the name
of a command, use the command help to get an overview of all malaga
commands.
If you want to quit malaga, enter the command quit.
You can use the following command line options when you start malaga:
- ``-morphology'' or ``-m'' starts malaga in morphology mode. That is, word forms are being read in from the
standard input stream and analysed (one word form per line). The analysis
result is being written to the standard output stream.
- ``-syntax'' or ``-s'' starts malaga in syntax
mode. That is, sentences are being read in from the standard input
stream and analysed (one sentence per line). The analysis result is being
written to the standard output stream.
3.4 The Program ``mallex''
By using mallex, you can make the allomorph rules process the entries of
a base form lexicon. A run time lexicon (with the ending ``.lex_c'')
will be built. Normally, mallex starts in batch mode. If you want
to run it interactively, you must give it the option ``-interactive'' or
``-i'' when starting (if you start it from Emacs with ``M-x
mallex'', this will be done automatically).
You can start mallex either with the name of a project file or with the
names of the needed grammar files:
mallex english.pro
or
mallex english.sym_c english.all_c english.lex
If you are not using a project file, you must give
-
the symbol file,
- the allomorph rule file, and
- the lexicon file (in batch mode).
If you have started mallex by using the option ``-interactive'' or
``-i, mallex runs interactively: it loads the symbol file and the
allomorph rule file. Then the prompt appears:
mallex (4.3) - Copyright (C) 1995-1999 Bjoern Beutel
This program comes with ABSOLUTELY NO WARRANTY.
This is free software which you may redistribute under certain conditions.
For details, refer to the GNU General Public License.
mallex>
You can now enter any mallex command. If you do not remember the command
names, you can use the command help to see an overview of the mallex commands.
If you want to quit mallex, enter the command quit.
If you've started mallex in batch mode, it creates the run time lexicon
file from the base form lexicon file. If the lexicons are very big or the
allomorph rules are very complex, this can take some minutes. After creation,
mallex quits.
You can use the following command line options when you start mallex:
- ``-interactive'' or ``-i'' runs mallex in interactive
mode.
- ``-readable'' or ``-r'' runs mallex in batch mode and
outputs the allomorph lexicon in readable form on the standard output stream.
3.5 The Program ``malmake''
The program malmake reads a project file, it checks if all grammar files
needed do exist, and it translates all grammar files that have not yet been
translated or whose source files have changed since they have been translated.
malmake itself calls the programs malsym, mallex and malrul if needed. An example: assume you have written a morphology grammar
whose grammar files are bundled in a project file ``english.pro'':
sym: rules/english.sym
all: rules/english.all
lex: rules/english.lex lex/adjectives.lex
lex: lex/particles.lex lex/suffixes.lex lex/verbs.lex
lex: lex/nouns.lex lex/abbreviations.lex lex/numbers.lex
mor: rules/english.mor
mallex: set hidden +semantics +syntax
malaga: set hidden +semantics
When executing ``malmake dmm.pro'' for the first time, the symbol file,
the rule files and the lexicon file will be translated:
compiling "dmm.sym"
compiling "dmm.all"
compiling "dmm.mor"
compiling "dmm.lex"
project is up to date
The translation of a big lexicon can take a long time, since the allomorph
rules have to be executed for each lexicon entry.
3.6 The Program ``malrul''
The program malrul translates Malaga rule files, i.e. files that have
the endings ``.all'', ``.mor'' or ``.syn''. The compiled
file gets the name ``.all_c'', ``.mor_c'', or ``.syn_c''.
Give the following arguments if you are starting malrul:
-
the rule file that is to be translated, and
- the associated symbol file.
The order of the arguments is arbitrary. Here is an example:
malrul english.mor english.sym_c
3.7 The Program ``malsym''
malsym can translate Malaga symbol files, i.e. files having the
ending ``.sym'' or ``.esym''. The translated file gets the ending
``.sym_c'' or ``.esym_c''.
For example:
malsym english.sym
If you are translating an extended symbol file with the ending ``.esym'',
enter the name of the compiled symbol file as an additional argument:
malsym english.esym english.sym_c
This argument is needed since extended symbol files are extensions of ordinary
symbol files.