treetrans: Tool for tree transformation
Japanese version
This is a tool for the conversion of parse trees using pattern rules.
treetrans [options] rule_module input_file output_database
|
rule_module | lilfes program in which pattern rules are implemented
|
input_file | Input treebank (text format)
|
output_database | Output treebank (lildb format)
|
Options
|
-v | print debug messages
|
-vv | print many debug messages
|
This tool inputs parse trees from a text file, applies tree conversion
rules to each input tree, and outputs the results into a lildb-style
database.
How to input parse trees
"treetrans" calls 'input_parse_tree/2' and reads parse trees from a
text file. 'input_parse_tree/2' is defined in "treetrans.lil" as an
interface of "treetrans". Its content is not implemented, and should
be implemented by a grammar developer. A line in an input file is
passed to the first argument of 'input_parse_tree/2', and a parse tree
should be returned in the second argument. Parse trees must be
represented with types defined in "treetypes.lil".
input_parse_tree(+$String, -$Tree)
|
+$String | A line in an input file
|
-$Tree | A parse tree
|
Reads a parse tree from a line in the input file
|
If parse trees are written in the Penn Treebank-style format, you can
simply use 'input_ptb_parse_tree/2' defined in "treetrans.lil". To
use 'input_ptb_parse_tree/2', you need to implement the following
interfaces defined in "treeio.lil".
ptb_empty_category(-$Category)
|
-$Category | The value of "SYM" to be regarded as an
empty category
|
Specify a preterminal symbol that should be regarded
as an empty category. "SYM" is a feature defined in "treetypes.lil".
|
ptb_preprocess_word(+$Input, -$Output)
|
+$Input | input word
|
-$Output | preprocessed input word
|
Apply preprocessing to an input word. For example,
you can replace special characters and convert letters into small
letters.
|
ptb_preprocess_pos(+$Input, -$Output)
|
+$Input | input POS
|
-$Output | preprocessed POS
|
Apply preprocessing to an input POS
|
ptb_delete_pos(-$POS)
|
-$POS | POS
|
Specify a POS that should be ignored. Partial trees
that have only ignored POSs are also ignored. $POS represents results
of 'ptb_preprocess_pos/2'.
|
After implementing them, call 'input_ptb_parse_tree/2' from
'input_parse_tree/2'. An example is like this.
ptb_empty_category("-NONE-").
ptb_preprocess_word($In, $Out) :- to_lower($In, $Out).
ptb_preprocess_pos($POS, $POS).
ptb_delete_pos(".").
ptb_delete_pos("""").
input_parse_tree($String, $Tree) :-
input_ptb_parse_tree($String, $Tree).
If an input file is written in another format, implement
'input_parse_tree/2' by yourself.
How to write tree conversion rules
Parse trees are converted in the following steps.
- Preprocessing
- Tree conversion by pattern rules
- Stemming
First, the following interfaces may be used for preprocessing an input
tree before applying conversion rules.
delete_tree(+$Tree)
|
+$Tree | tree: parse tree
|
Remove a subtree that is unifiable with +$Tree.
|
nonterminal_mapping(+$InSym, -$OutSym)
|
+$InSym | nonterminal symbol of an input tree
|
-$OutSym | nonterminal symbol of an output tree
|
Convert nonterminal symbol $InSym into $OutSym.
|
preterminal_mapping(+$InSurface, +$InSym, -$OutSurface, -$OutSym)
|
+$InSurface | input word (surface form)
|
+$InSym | input nonterminal symbol
|
-$OutSurface | output word (surface form)
|
-$OutSym | output nonterminal symbol
|
Convert a word, $InSurface/$InSym, into $OutSurface/$OutSym.
|
preterminal_projection(+$InSym, -$NewSym)
|
+$InSym | preterminal symbol
|
-$NewSym | nonterminal symbol
|
Insert a nonterminal symbol as the mother of
preterminal $InSym.
|
Pattern rules are implemented as lilfes programs with interfaces
defined in "treetrans.lil". Parse trees are represented in feature
structures defined in "treetypes.lil". For example, the following
pattern rule converts a tree like "(... than/IN XXX)" into "(... (PP
than/IN XXX:argument))".
tree_transform_class("than", "topdown", "weak").
tree_subst_pattern("than",
TREE_NODE\$Node & TREE_DTRS\$Dtrs,
TREE_NODE\$Node & TREE_DTRS\$NewDtrs) :-
$Dtrs = [$Left & tree_any & ANY_TREES\[_|_],
$Than & tree & TREE_NODE\(SYM\"IN" & WORD\SURFACE\"than"),
$Right & tree & TREE_NODE\HEAD_MARK\argument],
$NewDtrs = [$Left,
TREE_NODE\(SYM\"PP" & MOD\[] & ID\[] & HEAD_MARK\modifier) &
TREE_DTRS\[$Than, $Right]].
First, write "tree_transform_class/3" in order to specify the
name of a conversion rule, the order of rule application, and the
behavior in which the rule application fails.
tree_transform_class(+$Name, +$Direction, +$Strict)
|
+$Name | The name of the conversion rule
|
+$Direction | The order of applying the rule
- "topdown": From a root to leaves
- "bottomup": From leaves to a root
- "rootonly": Only to the root of a tree
+$Strict | The behavior in which the rule
application fails
- "strict": Fail the conversion of a whole tree
- "weak": Ignore the failure of this rule
| |
Next, write conversion rules with the following interfaces. In all
the interfaces, the first argument is the name of a rule that has been
specified in "tree_transform_class/3".
The treetrans tool traverses each node in parse trees and
applies conversion rules in the order of
"tree_transform_class/3" in the program file.
tree_ignore(+$Name, ?$Tree)
|
+$Name | rule name
|
+$Tree | tree: parse tree
|
Remove a subtree that is unifiable with +$Tree.
|
tree_transform_rule(+$Name, +$InTree, -$OutTree)
|
+$Name | rule name
|
+$InTree | tree: input parse tree
|
-$OutTree | tree: output parse tree
|
Convert $InTree into $OutTree.
|
tree_subst_pattern(+$Name, +$InPattern, +$OutPattern)
|
+$Name | rule name
|
+$InTree | tree: pattern of an input tree
|
+$OutTree | tree: pattern of an output tree
|
Convert a parse tree that matches with $InTree using
"tree_match/2" into $OutPattern.
|
tree_unify(+$Name, ?$Tree)
|
+$Name | rule name
|
+$Tree | tree: parse tree
|
Unify $Tree with the target tree.
|
tree_match_pattern(+$Name, +$Pattern)
|
+$Name | rule name
|
+$Tree | tree: pattern on a parse tree
|
Unify $Pattern with the target tree using
"tree_match/2".
|
Conversion rules are applied in the order of definitions
by tree_transform_class/3.
For one conversion rule, conversions by the iterfaces are
tested in the order of tree_ignore/2,
tree_transform_rule/3, tree_subst_pattern/3, tree_unify/2,
tree_match_pattern/2.
If a conversion by one interface succeeds, the rest conversions
for the same rule will not be tested.
In conversion rules, you can use several tools such as
"tree_binarize/2" (implemented in "binarizer.lil" to binarize a tree)
and "mark_head/1", "mark_modifier/1" (defined in "markhead.lil" to annotate
head/modifier/argument marks.
MAYZ Toolkit Manual
MAYZ Home Page
Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)