University Entrance Examinations as a Benchmark Resource for NLP-based Problem Solving

This paper describes a corpus comprised of university entrance examinations, which is aimed to promote research on NLP-based problem solving. Since entrance examinations are created for quantifying human ability of problem solving, they are a desirable resource for benchmarking NLP-based problem solving systems. However, as entrance examinations involve a variety of subjects and types of questions, in order to pursue focused research on specific NLP technologies, it is necessary to break down entire examinations into individual NLP subtasks. For this purpose, we provide annotations of question classifications in terms of answer types and knowledge types. In this paper, we also describe research issues by referring to results of question classification, and introduce two international shared tasks that employed our resource for developing their evaluation data sets.

[1]  Noriko Kando,et al.  Overview of the Recognizing Inference in Text (RITE-2) at NTCIR-10 , 2013, NTCIR.

[2]  Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NTCIR-9, National Center of Sciences, Tokyo, Japan, December 6-9, 2011 , 2011, NTCIR.

[3]  Walid Magdy,et al.  Proceedings of The Twenty-First Text REtrieval Conference, TREC 2012, Gaithersburg, Maryland, USA, November 6-9, 2012 , 2012, TREC.

[4]  Kôiti Hasida,et al.  Construction of a Japanese Relevance-tagged Corpus , 2002, LREC.

[5]  Emanuele Pianta,et al.  Question Answering for Machine Reading Evaluation , 2010, CLEF.

[6]  Steffen Staab,et al.  Ontology-Based Query and Answering in Chemistry: OntoNova @ Project Halo , 2003, SEMWEB.

[7]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[8]  Yusuke Miyao,et al.  Answering Yes/No Questions via Question Inversion , 2012, COLING.

[9]  Djoerd Hiemstra,et al.  Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics , 2012, Lecture Notes in Computer Science.

[10]  Yuji Matsumoto,et al.  Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations , 2007, LAW@ACL.

[11]  Ulrich Schäfer,et al.  Combining OCR Outputs for Logical Document Structure Markup. Technical Background to the ACL 2012 Contributed Task , 2012, Discoveries@ACL.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Eduard H. Hovy,et al.  Overview of QA4MRE at CLEF 2011: Question Answering for Machine Reading Evaluation , 2011, CLEF.

[14]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[15]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[16]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[17]  Christiane Fellbaum,et al.  The Manually Annotated Sub-Corpus: A Community Resource for and by the People , 2010, ACL.

[18]  Hsin-Hsi Chen,et al.  NTCIR workshop 6 meeting : proceedings of the 6th NTCIR workshop meeting on evaluation of information access technologies: information retrieval, question answering and cross-lingual information access , 2007 .