Enju is an accurate natural language parser for English. With a wide-coverage probabilistic HPSG grammar [1-7] and an efficient parsing algorithm [8-11], this parser can effectively analyze syntactic/semantic structures of English sentences and provide a user with phrase structures and predicate-argument structures. Those outputs would be especially useful for high-level NLP applications, including information extraction, automatic summarization, question answering, and machine translation, where the "meaning" of a sentence plays a central role.
This repository also includes the code for the Japanese CCG parser [19-21] and the Chinese HPSG parser [17-18]. The Japanese CCG parser is available as Jigg.
The main features of the Enju parser are:
mogura
).Other useful features are:
-xml
. The parser adds XML tags to an original text, and it is useful when parse results are merged with other processing results (e.g. named entities). A stand-off format is also available (specify -so
).-genia
.-brown
.mogura -super
enju2ptb/convert < ENJU_XML_OUTPUT > PTB_STYLE_OUTPUT
-A
. Parsing accuracy improves, while parsing speed gets slower.-N
. This is an experimental function, and parsing speed gets slower.