RANIS - Relational representation of context-dependent roles on information science papers

GitHub repository

as of 2017-08-28

This corpus is a set of research abstracts in information science domain, where the mentions of named entities and domain specific entites are marked and their roles in the context of the paper are annotated in the form of relations to other entities. There are two subcorpora: the Japanese corpus consisting of 230 abstracts from IPSJ (Information Procsssing Society of Japan) Journal, and the English corpus consisting of 250 abstracts from the ACL anthology and 150 abstracts from SEMEVAL 2010 task5 set (part of the ACM digital library).

The corpus is created using brat and the annotation is in the brat standoff format (see http://brat.nlplab.org/standoff.html). The configuration files for viewing with brat is also included in corpus.

Japanese Corpus

The following files are included, in the JA subdirectory:

English Corpus

The following files are included, in the EN subdirectory:

The annotation scheme for the English data is more fine-grained than the scheme for the Japanese data, and the former can be converted into the latter. The conversion script is available.

Documentation

Annotation guidelines for Japanese and English data are available.

Notes

When you publish the results using the Japanese corpus, please cite

When you publish the results using the Japanese corpus, please cite

This work was partially supported by JSPS Grant-in-Aid for Scientific Research (B) No. 22300031, and by Data Centric Science Research Commons.

Annotations are Copyright (C) 2013-2016 Miyao Lab, National Institute of Informatics, Japan Creative Commons License CC-BY-SA

IPSJ materials are Copyright (C) 1960-2016 Information Processing Society of Japan; ACL materials are Copyright (C) 1963-2016 ACL; other materials are copyrighted by their respective copyright holders.