How to use a grammar

This chapter explains the method of parsing with a grammar developed with the MAYZ toolkit.

How to use "UP"
How to use a lexicon and templates
How to use probabilistic models

How to use "UP"

While the MAYZ toolkit supports the development of a lexicon and templates, we need a parser for the parsing of sentences with a grammar developed by MAYZ. The package of MAYZ includes "UP", an efficient general-purpose parser for unification-based grammars. With implements several interfaces required by UP, you can parse sentences with the developed grammar.

To use UP, interfaces for accessing a grammar and probabilistic models must be implemented. The interfaces are defined in "mayz/parser.lil".

The interfaces of UP at least required for parsing are as follows. Grammar writers need to implement all of them.

`sentence_to_word_lattice(+$Input, -$WordLattice)`
$Input	input sentence
$WordLattice	list of extent
Splits an input sentence $Input into words, and returns a word lattice $WordLattice.

`lexical_entry(+$Word, -$LexName)`
$Word	input word
$LexName	name of a lexical entry
Returns the name of a lexical entry assigned to $Word. A word can have multiple $LexName.

`lexical_entry_sign(+$LexName, -$Sign)`
$LexName	name of a lexical entry
$Sign	sign of a lexical entry
Returns the sign of a lexical entry. A unique sign must be assigned to $LexName.

`id_schema_unary(+$SchemaName, +$Dtr, -$Mother, -$DCP)`
$SchemaName	schema name
$Dtr	sign of the daughter
$Mother	sign of the mother
$DCP	LiLFeS program executed after schema application
Applies a unary schema. If your grammar does not require unary rules, this need not be implemented.

`id_schema_binary(+$SchemaName, +$Left, +$Right, -$Mother, -$DCP)`
$SchemaName	schema name
$Left	sign of the left daughter
$Right	sign of the right daughter
$Mother	sign of the mother
$DCP	LiLFeS program executed after schema application
Applies a binary schema.

`root_sign($Sign)`
$Sign	sign of the root node
Condition of a root node.

`reduce_sign(+$InSign, -$OutSign, -$SignPlus)`
$InSign	the sign of the mother of schema application
$OutSign	a reduced sign
$SignPlus	information removed from $OutSign
This predicate is applied to the mother sign after the success of schema application. In a following process of parsing, $OutSign is used instead of $InSign. By removing unnecessary information from $InSign (e.g. daughter structures), equivalent $OutSigns are factored and regarded as a unique sign in the following process. $SignPlus can have the information removed from the sign, and it is stored in SIGN_PLUS of 'edge_link'.

"mayz/sample_hpsg.lil" is an example grammar of HPSG and includes a sample implementation of the above interfaces.

Since the above interfaces do not have access to probabilistic models, a parser cannot invoke disambiguation. If you use UP with the grammar with the above interfaces only, run UP with the option "-nofom". For example, when you use "mayz/sample_hpsg.lil", run the following command.

% up -i -nofom -l mayz/sample_hpsg

When you need disambiguation, the following interfaces must be implemented. With implementing the followings, UP computes figures-of-merit (FOM) during parsing, and we can obtain the best analysis using 'best_fom_sign/2' etc. Since FOMs are summed up, log-probabilities should be used when you apply probabilistic models.

`fom_root(+$Sign, -$FOM)`
$Sign	sign of the root node
$FOM	FOM of the root node
Returns FOM of the root node.

`fom_binary(+$RuleName, +$LeftDtr, +$RightDtr, +$MotherSign, +$SignPlus, -$FOM)`
$RuleName	schema name
$LeftDtr	sign of the left daughter
$RightDtr	sign of the right daughter
$MotherSign	sign of the mother
$SignPlus	3rd argument of 'reduce_sign/3'
$FOM	FOM
Returns FOM of binary schema application.

`fom_unary(+$RuleName, +$Dtr, +$MotherSign, +$SignPlus, -$FOM)`
$RuleName	schema name
$Dtr	sign of the daughter
$MotherSign	sign of the mother
$SignPlus	3rd argument of 'reduce_sign/3'
$FOM	FOM
Returns FOM of unary schema application.

`fom_terminal(+$LexName, +$Sign, +$SignPlus, -$FOM)`
$LexName	LEX_NAME (the second argument of 'lexical_entry/3')
$Sign	sign of a lexical entry
$SignPlus	3rd argument of 'reduce_sign/3'
$FOM	FOM
Returns FOM of a terminal sign.

`fom_lexical entry(+$Word, +$LexName, -$FOM)`
$Word	word
$LexName	LEX_NAME (the second argument of 'lexical_entry/3')
$FOM	FOM
Returns FOM of a lexical entry

When you use UP with the grammar with the above interfaces, run UP with the option "-fom" or "-iter". For example, when the grammar file is "mygrammar.lil", execute the following command.

% up -i -iter -l mygrammar

See the manual of UP for other functions of UP.

How to use a lexicon and templates

MAYZ provides functions only for getting a lexicon and templates from a database. Grammar developers are supposed to implement the interfaces of UP. For details, see "How to use UP".

MAYZ provides the following tools for accessing the databases of a lexicon and templates. They are implemented in "mayz/grammar.lil". MAYZ also provides a tool for employing an external tagger.

`import_lexicon($LexFile, $TemplateFile)`
$LexFile	file name of a lexicon
$TemplateFile	file name of a template database
Imports a lexicon and a template database.

`lookup_lexicon(+$Word, -$TempNameList)`
$Word	a feature structure representing a "word"
$TempNameList	a list of `lex_template`
Returns a list of template names assigned to a word by looking up a lexicon.

`lookup_template(+$TempName, -$Template)`
$TempName	lex_template
$Template	a feature structure
Returns a feature structure of a lexical entry template by looking up a template database.

To use the above tools, you need to implement the following interfaces.

lexicon_lookup_key(+$Word, -$Key)
$Word	a feature structure representing a "word"
$Key	a key for looking up a lexicon
Given a feature structure representing a "word" (an element of the list returned by 'sentence_to_word_lattice/2'), this interface returns a key for looking up a lexicon (corresponding to the third argument of 'inverse_lexical_rule/5' and the fourth argument of 'lexical_rule/5').

unknown_word_lookup_key(+$Word, -$Key)
$Word	A feature structure representing a "word"
$Key	a key for looking up a lexicon
Given a feature structure representing a "word", this interface returns a key for looking up a lexicon for an unknown word.

When making a lexical entry in 'lexical_entry/2' and 'lexical_entry_sign/2', the tools "lookup_lexicon/2" and "lookup_template/2" will be used.

How to use a probabilistic model

Probabilistic models developed using unimaker, or forestmaker can be used as a figure-of-merit (FOM) model in UP. MAYZ provides a parser, mayzup, specialized for the probabilistic models developed with MAYZ. This parser provides builtin-predicates for computing FOM (log probability) using interfaces used in the development of probabilistic models, i.e., extract_XXX_event and feature_mask/3.

The following predicates are provided only in mayzup.

`init_amis_model(+$ModelName, +$ModelFile)`
$ModelName	model name
$ModelFile	name of the parameter file
Initializes a model with reading parameters from $ModelFile, and also incorporates corresponding feature_masks.

`delete_amis_model(+$ModelName)`
$ModelName	model name
delete a model created by 'init_amis_model/2'.

`amis_event_weight(+$ModelName, +$Category, +$Event, -$FOM)`
$ModelName	model name
$Category	category name
$Event	event (list of strings)
$FOM	FOM of the event (log probability)
Returns FOM (log probability) of the event represented as a list of strings. 'feature_mask/3' of the category $Category is used.

`amis_log_probability(+$ModelName, +$Category, +$EventList, -$FOM)`
$ModelName	model name
$Category	category name
$EventList	list of events (list of lists of events)
$FOM	list of FOMs
Computes a weight of each event in $EventList, and computes its probability by normalizing weights.

FOM of an event can be computed using the above built-in predicates. Computed FOMs are passed to a parser using the interfaces introduced in "How to use UP".

The usage of mayzup is almost the same as up. For example, when you use "mygrammar.lil", run the following command.

% mayzup -i -iter -l mygrammar

MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory

MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)