Question Answering (QA) is a principal application of natural language processing, and many researchers have devoted their effort intensively. However, most QA systems developed so far involve a blackbox process to obtain final answers. Such systems prevent users from judging the appropriateness of obtained answers, and their practical usefuless have been limited.
This dataset is developed for supporting the development of QA systems that can explain the entire process of obtaining answers. The current version targets simple questions that should be answered with an encyclopedia, and questions are annotated with meta information that supports the development of QA systems.
Human annotators manually developed question texts, supposing that they could be answered using an encyclopedia. The development set includes 800 questions, while the test set includes 200 questions.
In addition to questions and their answers, we manually annotated meta information such as question types, clues to obtain answers, Wikipedia pages in which answers can be found. We additionally annotated SPARQL queries for questions that can be answered using JWO (Japanese Wikipedia Ontology) developed in Yamaguchi Lab. at Keio Univ.
See Definitions for Question Answering Tagged Dataset (in Japanese only) for the details of meta information and the data format.
When you publish a paper using this dataset, please cite
This dataset is distributed under Creative Commons License CC-BY-SA.
This work was partially supported by JST PRESTO.