This chapter explains the method of parsing with a grammar developed with the MAYZ toolkit.
While the MAYZ toolkit supports the development of a lexicon and templates, we need a parser for the parsing of sentences with a grammar developed by MAYZ. The package of MAYZ includes "UP", an efficient general-purpose parser for unification-based grammars. With implements several interfaces required by UP, you can parse sentences with the developed grammar.
To use UP, interfaces for accessing a grammar and probabilistic models must be implemented. The interfaces are defined in "mayz/parser.lil".
The interfaces of UP at least required for parsing are as follows. Grammar writers need to implement all of them.
sentence_to_word_lattice(+$Input, -$WordLattice) | |
$Input | input sentence |
$WordLattice | list of extent |
Splits an input sentence $Input into words, and returns a word lattice $WordLattice. |
lexical_entry(+$Word, -$LexName) | |
$Word | input word |
$LexName | name of a lexical entry |
Returns the name of a lexical entry assigned to $Word. A word can have multiple $LexName. |
lexical_entry_sign(+$LexName, -$Sign) | |
$LexName | name of a lexical entry |
$Sign | sign of a lexical entry |
Returns the sign of a lexical entry. A unique sign must be assigned to $LexName. |
id_schema_unary(+$SchemaName, +$Dtr, -$Mother, -$DCP) | |
$SchemaName | schema name |
$Dtr | sign of the daughter |
$Mother | sign of the mother |
$DCP | LiLFeS program executed after schema application |
Applies a unary schema. If your grammar does not require unary rules, this need not be implemented. |
id_schema_binary(+$SchemaName, +$Left, +$Right, -$Mother, -$DCP) | |
$SchemaName | schema name |
$Left | sign of the left daughter |
$Right | sign of the right daughter |
$Mother | sign of the mother |
$DCP | LiLFeS program executed after schema application |
Applies a binary schema. |
root_sign($Sign) | |
$Sign | sign of the root node |
Condition of a root node. |
reduce_sign(+$InSign, -$OutSign, -$SignPlus) | |
$InSign | the sign of the mother of schema application |
$OutSign | a reduced sign |
$SignPlus | information removed from $OutSign |
This predicate is applied to the mother sign after the success of schema application. In a following process of parsing, $OutSign is used instead of $InSign. By removing unnecessary information from $InSign (e.g. daughter structures), equivalent $OutSigns are factored and regarded as a unique sign in the following process. $SignPlus can have the information removed from the sign, and it is stored in SIGN_PLUS of 'edge_link'. |
"mayz/sample_hpsg.lil" is an example grammar of HPSG and includes a sample implementation of the above interfaces.
Since the above interfaces do not have access to probabilistic models, a parser cannot invoke disambiguation. If you use UP with the grammar with the above interfaces only, run UP with the option "-nofom". For example, when you use "mayz/sample_hpsg.lil", run the following command.
% up -i -nofom -l mayz/sample_hpsg
When you need disambiguation, the following interfaces must be implemented. With implementing the followings, UP computes figures-of-merit (FOM) during parsing, and we can obtain the best analysis using 'best_fom_sign/2' etc. Since FOMs are summed up, log-probabilities should be used when you apply probabilistic models.
fom_root(+$Sign, -$FOM) | |
$Sign | sign of the root node |
$FOM | FOM of the root node |
Returns FOM of the root node. |
fom_binary(+$RuleName, +$LeftDtr, +$RightDtr, +$MotherSign, +$SignPlus, -$FOM) | |
$RuleName | schema name |
$LeftDtr | sign of the left daughter |
$RightDtr | sign of the right daughter |
$MotherSign | sign of the mother |
$SignPlus | 3rd argument of 'reduce_sign/3' |
$FOM | FOM |
Returns FOM of binary schema application. |
fom_unary(+$RuleName, +$Dtr, +$MotherSign, +$SignPlus, -$FOM) | |
$RuleName | schema name |
$Dtr | sign of the daughter |
$MotherSign | sign of the mother |
$SignPlus | 3rd argument of 'reduce_sign/3' |
$FOM | FOM |
Returns FOM of unary schema application. |
fom_terminal(+$LexName, +$Sign, +$SignPlus, -$FOM) | |
$LexName | LEX_NAME (the second argument of 'lexical_entry/3') |
$Sign | sign of a lexical entry |
$SignPlus | 3rd argument of 'reduce_sign/3' |
$FOM | FOM |
Returns FOM of a terminal sign. |
fom_lexical entry(+$Word, +$LexName, -$FOM) | |
$Word | word |
$LexName | LEX_NAME (the second argument of 'lexical_entry/3') |
$FOM | FOM |
Returns FOM of a lexical entry |
When you use UP with the grammar with the above interfaces, run UP with the option "-fom" or "-iter". For example, when the grammar file is "mygrammar.lil", execute the following command.
% up -i -iter -l mygrammar
See the manual of UP for other functions of UP.
MAYZ provides functions only for getting a lexicon and templates from a database. Grammar developers are supposed to implement the interfaces of UP. For details, see "How to use UP".
MAYZ provides the following tools for accessing the databases of a lexicon and templates. They are implemented in "mayz/grammar.lil". MAYZ also provides a tool for employing an external tagger.
import_lexicon($LexFile, $TemplateFile) | |
$LexFile | file name of a lexicon |
$TemplateFile | file name of a template database |
Imports a lexicon and a template database. |
lookup_lexicon(+$Word, -$TempNameList) | |
$Word | a feature structure representing a "word" |
$TempNameList | a list of lex_template |
Returns a list of template names assigned to a word by looking up a lexicon. |
lookup_template(+$TempName, -$Template) | |
$TempName | lex_template |
$Template | a feature structure |
Returns a feature structure of a lexical entry template by looking up a template database. |
To use the above tools, you need to implement the following interfaces.
lexicon_lookup_key(+$Word, -$Key) | |
$Word | a feature structure representing a "word" |
$Key | a key for looking up a lexicon |
Given a feature structure representing a "word" (an element of the list returned by 'sentence_to_word_lattice/2'), this interface returns a key for looking up a lexicon (corresponding to the third argument of 'inverse_lexical_rule/5' and the fourth argument of 'lexical_rule/5'). |
unknown_word_lookup_key(+$Word, -$Key) | |
$Word | A feature structure representing a "word" |
$Key | a key for looking up a lexicon |
Given a feature structure representing a "word", this interface returns a key for looking up a lexicon for an unknown word. |
When making a lexical entry in 'lexical_entry/2' and 'lexical_entry_sign/2', the tools "lookup_lexicon/2" and "lookup_template/2" will be used.
Probabilistic models developed using unimaker, or forestmaker can be used as a figure-of-merit (FOM) model in UP. MAYZ provides a parser, mayzup, specialized for the probabilistic models developed with MAYZ. This parser provides builtin-predicates for computing FOM (log probability) using interfaces used in the development of probabilistic models, i.e., extract_XXX_event and feature_mask/3.
The following predicates are provided only in mayzup.
init_amis_model(+$ModelName, +$ModelFile) | |
$ModelName | model name |
$ModelFile | name of the parameter file |
Initializes a model with reading parameters from $ModelFile, and also incorporates corresponding feature_masks. |
delete_amis_model(+$ModelName) | |
$ModelName | model name |
delete a model created by 'init_amis_model/2'. |
amis_event_weight(+$ModelName, +$Category, +$Event, -$FOM) | |
$ModelName | model name |
$Category | category name |
$Event | event (list of strings) |
$FOM | FOM of the event (log probability) |
Returns FOM (log probability) of the event represented as a list of strings. 'feature_mask/3' of the category $Category is used. |
amis_log_probability(+$ModelName, +$Category, +$EventList, -$FOM) | |
$ModelName | model name |
$Category | category name |
$EventList | list of events (list of lists of events) |
$FOM | list of FOMs |
Computes a weight of each event in $EventList, and computes its probability by normalizing weights. |
FOM of an event can be computed using the above built-in predicates. Computed FOMs are passed to a parser using the interfaces introduced in "How to use UP".
The usage of mayzup is almost the same as up. For example, when you use "mygrammar.lil", run the following command.
% mayzup -i -iter -l mygrammar