LiLFeS modules

Japanese version

In addition to the tools explained above, MAYZ provides LiLFeS modules to support grammar development. These can be used by loading from (by "-l" option) or from a parser.

Modules for grammar development
Modules for parsing
- Looking up a lexicon and templates
- Using an external tagger
Browsing grammar development or parsing
Using a parser in applications
- Store parse results in a database

Marking head, argument, and modifier

"mayz/markhead.lil" is a program for annotating a head, argument, or modifier mark toeach node in a tree. By implementing several rules for marking, it automatically annotate marks to all nodes in a tree.

The interfaces for marking heads are as follows. The first two refer to the MOD feature, while the other referes to the SYM feature to determine heads.

`head_tag(+$Tag)`
The node will be marked as a head if the MOD feature includes $Tag.

`nonhead_tag(+$Tag)`
The node will be marked as a non-head (argument or modifier) if the MOD feature includes $Tag. Arguments and modifiers are distinguished by other rules.

`head_table(+$Sym, +$Dir, +$SymList)`
$Sym	Symbol of the parent node
$Dir	Direction of searching a head ("left" or "right")
$SymList	List of symbols that should be marked as a head
When the symbol the parent node is $Sym, child nodes are searched in the direction $Dir (if "left", left-to-right, and "right", right-to-left), and the node labeled with the first element of $SymList is marked as a head. If the first element is not found in the child nodes, the node labeled with the next element is searched. If an element of $SymList is a list, the node labeled with a symbol in the list is marked as a head. If no symbol is found, the left most node is marked as a head if $Dir is "left", and the right most one if "right".

The following predicate marks heads in a parse tree using the above interfaces.

`mark_head(+$Tree)`
$Tree	parse tree
Annotates a head mark in a parse tree using the following algorithm. If one of the daughters is assigned "head", exit. If one of the daughters is assigned a modifier tag specified in 'head_tag/1', mark the node as a head. If a daughter is assigned a modifier tag specified in 'nonhead_tag/1', the node is ignored. Determine a head according to 'head_table/3'.

The interfaces for marking modifiers and arguments are as follows. The program assumes that head marks are already assigned. The first two refer to the MOD feature, while the rests refer to the SYM feature.

`argument_tag(+$Tag)`
If the MOD feature includes $Tag, the node is marked as an argument.

`modifier_tag(+$Tag)`
If the MOD feature includes $Tag, the node is marked as a modifier.

`head_argument_table(+$HeadSym, +$SymList)`
$HeadSym	symbol of the head
$SymList	list of symbols
If the symbol of the head is $HeadSym, a sibling node is marked as an argument if its symbol is included in $SymList.

`argument_table(+$Sym, +$SymList)`
$Sym	symbol of the mother
$SymList	list of symbols
If the symbol of the mother is $Sym, a sibling node is marked as an argument if its symbol is included in $SymList.

`left_argument_table(+$Sym, +$SymList)`
$Sym	symbol of the mother
$SymList	list of symbols
If the symbol of the mother is $Sym, a sibling node is marked as an argument if the node is in the left of the head and its symbol is included in $SymList.

`right_argument_table(+$Sym, +$SymList)`
$Sym	symbol of the mother
$SymList	list of symbols
If the symbol of the mother is $Sym, a sibling node is marked as an argument if the node is in the right of the head and its symbol is included in $SymList.

Using the above interface, the following predicate assigns argument or modifier marks to all nodes in a parse tree.

`mark_modifier(+$Tree)`
$Tree	parse tree
Nodes in $Tree are marked as a modifier or a argument using the following algorithm. If the node has a tag specified by 'argument_tag/1', it is marked as "argument". If the node has a tag specified by 'modifier_tag/1', it is marked as "modifier". Using 'head_argument_table/2', argument marks are assigned. Using 'argument_table/2', argument marks are assigned. Using 'left_argument_table/2', argument marks are assigned. Using 'right_argument_table/2', argument marks are assigned. All the remaining nodes are assigned "modifier".

The above predicate ignored the nodes already assigned some marks. This means that you can assign marks to exceptional constructions before using the above tools. User can also use the following interface for the marking of exceptional trees. The following interface is used when the above predicate try to assign a mark to each node.

`mark_exceptional(+$Tree)`
$Tree	parse tree
A user marks $Tree.

Binarizing a tree

"mayz/binarizer.lil" provides a tool to binarize a tree annotated with head, modifier, and argument marks.

`tree_binarize(+$Tree, -$BinTree)`
$Tree	input tree
$BinTree	binarized tree
$Tree is binarized into $BinTree.

This predicate binarizes a tree where the head is centered and the right nodes of the head are in the lower part and the left ones are in the higher part. If you need an exceptional binarization strategy, the following interface can be used. It is called for each node in a tree.

`binarizer_preprocess(+$Tree, -$BinTree)`
$Tree	input tree
$BinTree	binarized tree

Pattern matching of trees

"mayz/treematch.lil" provides predicates for pattern matching of parse trees. It is useful when you use "treetrans" to convert parse trees. You can match and substitute parse trees using patten rules.

While ap pattern of a parse tree is represented with a feature structure representation of a parse tree (i.e., 'tree' type), you can additionally use 'tree_any' type. It matches with zero or more than zero parse trees. For example, the following pattern,

(tree &
 TREE_NODE\SYM\"S" &
 TREE_DTRS\[tree_any,
            (tree & TREE_NODE\SYM\"VP"),
            tree_any])

matches a tree in which the top node is labeled with "S" and it has at least one daughter labeled with "VP". It matches a tree even when the tree has more than zero daughters on the left and/or the right of the "VP" tree. The trees that matched with 'tree_any' are stored in the feature ANY_TREES\.

The following predicates are provided for the matching and the substitution of parse trees using patterns.

`tree_match(+$Patten, +$Tree)`
$Pattern	pattern on a parse tree ('tree' or 'tree_any')
$Tree	input parse tree ('tree')
Succeeds when the pattern matches with the parse tree.
> ?- tree_match((tree & TREE_NODE\SYM\"SBAR" & TREE_DTRS\[TREE_NODE\(SYM\"RB" & WORD\SURFACE\"rather"), TREE_NODE\(SYM\"IN" & WORD\SURFACE\"than"), TREE_NODE\(SYM\"NP")]), (tree & TREE_DTRS\[tree_any & ANY_TREES\[_\|_], tree & TREE_NODE\(SYM\"IN" & WORD\SURFACE\"than"), tree & TREE_NODE\HEAD_MARK\argument])). yes

`tree_substitution(+$OutPattern, -$OutTree)`
$InPattern	pattern on a parse tree ('tree' or 'tree_any')
$OutTree	output ('tree')
Convert a pattern on a parse tree (including 'tree_any') into an ordinary parse tree (without 'tree_any').

`tree_subst(+$InPattern, +$OutPattern, +$InTree, -$OutTree)`
$InPattern	pattern on an input parse tree ('tree' or 'tree_any')
$OutPattern	pattern on an output parse tree ('tree' or 'tree_any')
$InTree	input parse tree (tree)
$OutTree	output (tree)
An input pattern is mathced with an input parse tree, and if it succeeds, the output pattern is converted into an output parse tree. That is, it is equivalent to the following operations. tree_match($InPattern, $InTree), tree_substitution($OutPattern, $OutTree). See the manual of "treetrans" for an example.

Looking up a lexicon, and templates

"mayz/grammar.lil" provides tools for looking up a lexicon and template in databases.

`import_lexicon(+$LexiconFile, +$TemplateFile)`
$LexiconFile	file name of a lexicon
$TemplateFile	file name of a template database
Imports a lexicon and a template database.

`lookup_lexicon(+$Word, -$TempNameList)`
$Word	input word
$TempNameList	list of lex_template
Looks up a lexicon, and return a list of template names.

`lookup_template(+$TempName, -$Sign)`
$TempName	lex_template
$Sign	feature structure
Looks up a template in a template database, and returns a feature structure of a template.

To use lookup_lexicon/2, the following interface must be implemented to get a database key from an input word.

`lexicon_lookup_key(+$Word, -$Key)`
$Word	input word
$Key	key of a lexicon database

`unknown_word_lookup_key(+$Word, -$Key)`
$Word	input word
$Key	key of a lexicon database for an unknown word

Using an external tagger

"mayz/tagger.lil" provides tools for using an external tagger. The following predicates are used for the initialization and termination of an external tagger.

`initialize_external_tagger(+$Name, +$Arguments)`
$Name	command name of a tagger (string)
$Arguments	command-line arguments of a tagger (list of strings)
Initializes an external tagger.

`terminate_external_tagger`
Terminates an external tagger.

`is_external_tagger_initialized`
Succeeds if a tagger is already initialized.

After the initialization, the following predicates are used for turning on/off the tagger.

`enable_external_tagger`
Turns on the tagger.

`disable_external_tagger`
Turns off the tagger.

`is_external_tagger_enabled`
Succeeds if a tagger is turned on.

The following predicates passes an input sentence to a tagger, and the resulting string is returned.

`external_tagger(+$Input, -$Output)`
$Input	input string
$Output	output string
When a tagger is turned on, $Input is passed to a tagger, and the output of the tagger is returned. When a tagger is off, $Input is just returned to $Output.

Browsing the process of tree transformation and grammar extraction

"mayz/morivtrans.lil" is a module for browsing the process of tree transformation (treetrans) and lexicon extraction (lexextract). Using a web browser supporting XHTML and XSLT (e.g. FireFox) or MoriV, you can browse tree structures and feature structures in the process of grammar development.

This module works as an HTTP server and a CGI. First, load this module together with modules for tree transformation and lexicon extraction.

% lilfes -l tree_transformation_module -l lexicon_extraction_module -l mayz/morivtrans

Next, invoke "cgi" command.

> ?- cgi.

Then, an HTTP server starts, and waits for a connection. From your browser, access to the 27109 port of "/cgi-lilfes/moriv?" of the host where you are running the lilfes.

http://server_host:27109/cgi-lilfes/moriv?

Input a Penn Treebank-style tree to the form, and press the "Input" button. You will see a menu in the lower-left area, and a parse tree in the lower-right area. You can browse trees and feature structures using the lower-left menu.

Browsing the results of parsing

"mayz/morivparser.lil" is a module for browsing the results of parsing with a grammar and a disambiguation model developed with MAYZ. Using a web browser supporting XHTML and XSLT (e.g. FireFox) or href="http://www-tsujii.is.s.u-tokyo.ac.jp/moriv/">MoriV, you can browse parse trees and signs of parse results.

To use this module, you need to implement the following interfaces in order to give a symbol to show a brief parse tree of a parse result. They are defined in "mayz/display.lil".

`sign_label(+$Sign, -$Symbol)`
$Sign	sign
$Symbol	string
Returns a symbol representing the sign.

`lexname_label(+$LexName, -$Symbol)`
$LexName	LEX_NAME (the 2nd argument of lexical_entry/3)
$Symbol	string
Returns a symbol representing LEX_NAME.

`schema_edge_label_unary(+$SchemaName, -$Label)`
$SchemaName	schema name
$Label	edge symbol
Returns a symbol assigned to the edge of unary schema application.

`schema_edge_label_binary(+$SchemaName, -$LeftLabel, -$RightLabel`
$SchemaName	schema name
$LeftLabel	symbol of the left edge
$RightLabel	symbol of the right edge
Returns symbols assigned to the edges of binary schema application.

`schema_label(+$SchemaName, -$Label`
$SchemaName	schema name
$Label	symbol
Returns a symbol representing a schema name.

`lex_template_label(+$LexTemplate, -$Label`
$LexTemplate	lex_template
$Label	symbol
Returns a symbol representing a template name.

`word_label(+$Word, -$Label)`
$Word	word
$Label	symbol
Returns a symbol representing a word.

`extent_label(+$Extent, -$Label)`
$Extent	extent
$Label	symbol
Returns a symbol representing an extent (an element of the 2nd argument of 'sentence_to_word_lattice/2').

This module works as an HTTP server and a CGI. When you run a parser, load "mayz/morivparser.lil", and execute the "cgi" command. For example, when you use "mayzup",

% mayzup -l grammar_module -l mayz/movirparser -e cgi

Then, an HTTP server starts, and waits for a connection. Using your browser, access to the 27109 port of "/cgi-lilfes/moriv?" of the host where lilfes is running.

http://server_host:27109/cgi-lilfes/moriv?

Enter a sentence in the form, and press the "Input" button. You will see the brief result of parsing and a menu in the lower-left area. You can browse parse trees and feature structures using the menu.

Browsing a parse chart

"mayz/morivchart.lil" is a module for browsing a parse chart (CKY table). Using a web browser supporting XHTML and XSLT (e.g. FireFox) or MoriV, you can brose internal parse results generated during parsing.

To use this module, you need to implement the interfaces for getting the symbols of parse trees. The interfaces are defined in "mayz/display.lil". For details, see Browsing the results of parsing.

When you run a parser, load "mayz/morivchart", and execute the "cgi" command to run an HTTP server. Then, access to the server using your browser. Enter a sentence in the form, and you will see the chart in the lower-left area. By clicking a chart cell, you will get the edges in the cell in the lower-right area.

Browsing lexical entries

"mayz/morivgrammar.lil" is a module for browsing a lexicon using a web browser supporting XHTML and XSLT (e.g. FireFox) or MoriV. You can browse a list lexical entries assigned to a word and their feature structures.

To use this module, you need to implement interfaces defined in "display.lil". For details, see Browsing the results of parsing.

When you run a parser, load "mayz/morivgrammar", and execute the "cgi" command to run an HTTP server. Then, access to the server using your browser. Enter a word/POS in the form, and you will see a list of lexical entries. Click the link in the list, and you will see the feature structure of a lexical entry in the lower-right frame.

Evaluating coverage

"mayz/coverage.lil" is a module to measure the coverage obtained by a grammar developed with MAYZ. Together with a grammar module, load "mayz/coverage.lil", and execute the following predicate.

`eval_coverage(+$Lexbank, +$Lexicon, +$Template, +$OutputFile)`
$Lexbank	name of a lexbank used for the evaluation
$Lexicon	file name of a lexicon
$Templates	file name of a template database
$OutputFile	file name of outputting results

For the evaluation of coverage, a lexbank of an unseen corpus is used. Before the evaluation, you need to make a lexbank using "treetrans" and "lexextract".

Evaluating parse accuracy

"mayz/evalparse.lil" is a module for evaluating the accuracy of parsing with a grammar and a probabilistic model developed with MAYZ. By implementing an interface to measure the number of correct answers for a sentence, you can measure the accuracy for the whole test corpus.

For the evaluation, the following interface is required to be implemented.

`eval_parse(+$Best, +$Correct, +$TermList, -$NumAnswers, -$NumOutputs, -$NumCorrects, -$NumPartials, -$Errors)`
$Best	parse_tree output by a parser
$Correct	correct parse_tree
$TermList	list of terminal nodes of a derivation (corresponding to a lexbank)
$NumAnswers	Number of answers
$NumOutputs	Number of outputs
$NumCorrects	Number of exactly correct outputs
$NumPartials	Number of partially correct outputs
$Errors	list of strings (each element is output to the result file)

When you run a parser, load "mayz/evalparse.lil", and execute the following predicate. The result of evaluation is output to a file.

`eval_parse_file(+$Derivbank, +$OutputFile)`
$Derivbank	name of a derivbank
$OutputFile	name of an output file
The accuracy of parsing is measured against $Derivbank, and the result is output to $OutputFile.

Store parse results in a database

"mayz/parseall.lil" is a LiLFeS module to store parse results into LiLFeS database (lildb). Each line of the input text is parsed, and the results are stored in a database. The key of the database is the line number of the input. If parsing fails, the result shows the reason of the failure with the type parse_error and its subtypes.

In this module, the following predicates are avaiable.

parse_all(+$Input, +$Output)
$Input	Name of input file
$Output	Name of database
Parse each line of the input file $Input, and store the results in the database $Output.

parse_all(+$Output)
$Output	Name of database
Parse each line of the standard input, and store the results in the database $Output.

MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory

MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)