Here we explain the unigram model used for disambiguation during parsing.
The unigram model is for assigning lexical entries to words. A word is w. The string containing the word is s. A lexical entry is l. We try to look for p(l|w,s). The probablistic model of ENJU is a Maximum Entropy model. The model would come up with the best weight for features. (For details, please refer to Probablistic Model). This is done by using the Maximum Entropy estimatorAmis and the following tools that work with Amis: unimaker, amisfilter.
The procedures are as follows:
A probablistic event is a string separated by // like the following:
The last field indicates the category of the event. The other fields are symbols that represent features of the event. unimaker would output a word and the probablistic event correpsonding to the assignment of a lexical entry to it.ms-period-//NNP//[D< N.3sg>]_lxm-noun_adjective_rule//haag//NNP//haag//NNP//uni
amisfilter would apply a mask to a probablistic eventa according to the category of the category of the event. A mask determines which of the fields are used together to create a certain feature. For the above probablistic event, applying the mask (0,1,1,1,1,1,1) would create the following feature:
amisfilter extracts features from probablistic events in this way and generates data files in amis format. Finally, amis would calcultate the best weight for each of these features._//NNP//[D< N.3sg>]_lxm-noun_adjective_rule//haag//NNP//haag//NNP//uni
Below we explain each of the steps involved.
Now let us explain how we use unimaker to output probablistic events to files.
Name of unimaker Model Event Extraction Module Derivbank Event File | |||||||||||||||
Name of Model | The name of the model | the name of the probablistic model(used for parsing) | |||||||||||||
Event Extraction Module | The lilfes module that includes the predicate for event extraction | ||||||||||||||
dervibank | The derivbank resulted from grammar acquisition(lildb format) | ||||||||||||||
Event File | The file which probablistic events are outputed(in text format or bompressed in gz/bz format)
The TERM TEMPLATE feature of the terminal node of a derivation tree is assigned the name of a lexical entry template (a combination of the name of a lexeme template and the history of lexical rule application) unimaker would generate probablistic events from the word information(the 'word'-typed value assigned to the TERM WORD feature) of each terminal node, the name of a lexical entry template (the value of the TERM_TEMPLATE feature) and the word information of terminal nodes nearby.
um_derivation_to_word_lattice(derivation_internal & DERIV_DTRS\$Dtrs, $WordLattice) :- um_derivation_to_word_lattice_dtrs($Dtrs, $WordLattice). %% recursive predicate um_derivation_to_word_lattice(derivation_terminal & TERM_WORD\$Word, [left_position\$LPos & right_position\$RPos & word\$LexEntry]) :- $LexEntry = [$Word], $Word = POSITION\$LPos, $RPos is $LPos + 1. um_derivation_to_deriv_word_lattice(derivation_internal & DERIV_DTRS\$Dtrs, $WordLattice) :- um_derivation_to_deriv_word_lattice_dtrs($Dtrs, $WordLattice). %% recursive predicate um_derivation_to_deriv_word_lattice(derivation_terminal & $Term & TERM_WORD\$Word, [left_position\$LPos & right_position\$RPos & word\$LexEntry]) :- $LexEntry = $Term, $Word = POSITION\$LPos, $RPos is $LPos + 1. um_correct_lexical_entry(TERM_WORD\$Word & LEXENTRY_SIGN\$Sign, $LexName) :- lookup_lexicon($Word, $TempNameList), member($TempName, $TempNameList), lookup_template($TempName, $LexEntry), equivalent($LexEntry, $Sign), !, $LexName = LEX_WORD\$Word & LEX_TEMPLATE\$TempName. um_complement_lexical_entry(TERM_WORD\$Word & LEXENTRY_SIGN\$Sign, $LexName) :- lookup_lexicon($Word, $TempNameList1), check_coverage($TempNameList1, $Sign, $TempName1), %% check whether $TempNameList1 contains %% elements that carry $Sign findall($Lex, (member($TN, $TempNameList1), $TN \= $TempName1, $Lex = LEX_WORD\$Word & LEX_TEMPLATE\$TN), $LexList), member(LEX_TEMPLATE\$TempName, $LexList), $LexName = LEX_WORD\$Word & LEX_TEMPLATE\$TempName. The extract_lexical_event/4 predicate, which is used for extracting probablistic events is found in "grammar/unievent.lil". Probablistic events of the category "uni" is extracted by this predicate. Its feature contains the following fields:
extract_lexical_event("hpsg-uni", "uni", $LexEntry, $Event) :- $LexEntry = (LEX_WORD\ (SURFACE\ $Surface & POS\ $Pos & BASE\ $Base & BASE_POS\ $BasePOS & POSITION\ $Position) & LEX_TEMPLATE\($LexTemplate & LEXEME_NAME\$LexemeName)), lex_template_label($LexTemplate, $LexName), $PositionN2 is $Position - 2, $PositionN1 is $Position - 1, $PositionP1 is $Position + 1, $PositionP2 is $Position + 2, $PositionP3 is $Position + 3, $PositionP4 is $Position + 4, lexical_event($PositionN2, $PositionN1, $Event, $Event2), %% -2 lexical_event($PositionN1, $Position, $Event2, $Event3), %% -1 $Event3 = [$Surface, $Pos, $LexName, $Base, $BasePOS, $LexemeName|$Event4], lexical_event($PositionP1, $PositionP2, $Event4, $Event5), %% 1 lexical_event($PositionP2, $PositionP3, $Event5, $Event6), %% 2 lexical_event($PositionP3, $PositionP4, $Event6, []). %% 3
The event file outputed by unimaker looks like the following.
(One event is outputed as one line. In the case of event_2_0, there are 3 probablistic events.)
Let us illustrate how to use amisfilter to apply masks to the probablistic events outputed above and extract features for generating a data file in Amis format.
The actual processing being done is as follows
The feature_mask/3 predicate is found in "grammar/lexmask.lil". The mask
|