The grammr of ENJU is developed by tranforming phrase structure trees of Penn Treebank into HPSG-style phrase structure trees. This transformation is done with the treetrans tool of mayz. For more details on treetrans, please refer to the manual of mayz.
treetrans rule module input file output database | |
rule module | lilfes file that contains the rules for transforming phrase structure trees |
input file | treebank in(text format) |
output database | treebank out(lildb format) |
In the case of ENJU grammar, all lines are filled by text files which contain the phrase structure tree of Penn Treebank. The input file line would be as follows:
treetrans would read the above text. The generated phrase structure tree would be represented by feature structures defined in the file "treetypes.lil" that comes with mayz. Next, the feature structures would be further transformed by pattern rules in the rule module and supplemented with additional information.
(S (NP-SBJ Ms./NNP Haag/NNP) (VP plays/VBZ (NP Elianti/NNP)) ./.)
The processing of phrase structure trees is actually broken down into the following steps:
A phrase structure tree represented as a feature structure is constructed out of a phrase structure tree supplied in text format. To do this, call the predicate input_parse_tree/2. Supply it with the line bearing the heading "input file".
In the case of ENJU, the treebank used as input is the Penn Treebank, which can be handled by the input_ptb_parse_tree/2 predicate comes with mayz. The input_ptb_parse_tree/2 predicate is called as a sub-clause of the input_parse_tree/2 of ENJU.
The input_ptb_parse_tree/2 predicate only converts the input tree to feature structures without doing any real change to the phrase structure of the tree. Leaf nodes can be changed with the following predicates: 語ptb_empty_category/1,ptb_preprocess_pos/2,ptb_delete_pos/1,ptb_preprocess_word/2. What these predicates do is given as follows:
In the case of ENJU, the following is specified for a phrase structure tree supplied as the input
During preprocessing, phrase structure trees are reshaped before applying pattern rules.
Following a breath first approach, it does the following with each node of a phrase structure tree:
They specify the following things:
These interfaces are called by the "devel/transmain.lil" predicate.nonterminal_mapping("NAC", "NP"). preterminal_mapping("%", "NN", "%", "%").
Pattern rules are applied to preprocessed phrase structure trees. Transformed trees are stored in the output database.
The objective is to construct HPSG-style phrase structure trees from the transformed phrase structure trees. Codess like the following are included as one of the pattern rules:
Pattern rules used for transformation are defined by the following interface predicates: tree_transform_class/3,tree_ignore/2, tree_transform_rule/3,tree_subst_pattern/3, tree_unify/2,tree_match_pattern/2. The first thing to do is the declaration of pattern rules. The name of the relevant rule, the order of rule application and the operation after rule application are specified by the tree_transform_class/3 predicate. Patterns rules are applied in the order they are declareded.
tree_transform_class(+$Name, +$Direction, +$Strict) | |||
+$Name | Name of a Pattern Rule | ||
+$Direction | The Order of Rule Application
|
The kind of processing done by a pattern rule is specified by the following predicates' tree_ignore/2, tree_transform_rule/3,tree_subst_pattern/3, tree_unify/2, tree_match_pattern/2. The processing of a rule by these predicates is as follows:
ENJU defines pattern rules in the following manner. This rule specifies the structure of a "than" phrase. It is a pattern rule that transform bracketing information given as (... than/IN XXX) to (... (PP than/IN XXX:argument)).
After completion of tranformation, check whether the correct value is assigned to the TREENODE\SCHEMANAME\ feature of each node in the relevant phrase structure tree. This feature gives the name of the schema applied to the daughter of the current node. This information would be extracted in the next step.
tree_transform_class("than", "topdown", "weak").
tree_subst_pattern("than",
TREE_NODE\$Node & TREE_DTRS\$Dtrs,
TREE_NODE\$Node & TREE_DTRS\$NewDtrs) :-
$Dtrs = [$Left & tree_any & ANY_TREES\[_|_],
$Than & tree & TREE_NODE\(SYM\"IN" & WORD\SURFACE\"than"),
$Right & tree & TREE_NODE\HEAD_MARK\argument],
$NewDtrs = [$Left,
TREE_NODE\(SYM\"PP" & FUNC\[] & ID\[] & HEAD_MARK\modifier_non_empty) &
TREE_DTRS\[$Than, $Right]].