The MCAT Math Retrieval System for NTCIR-11 Math Track

This paper describes the participation of our MCAT search system in the NTCIR-11 Math-2 Task. The purpose of this task is to search mathematical expressions using hybrid queries containing both formulae and keywords. We introduce an encoding technique to capture the structure and content of the mathematical expressions. Each expression is accompanied by two types of automatically extracted textual information, namely words in context window and descriptions. In addition, we examine the improvement in ranking obtained by utilizing dependency graph of mathematical expressions and post-retrieval reranking method. The results show that the use of description and dependency graph together delivers better ranking performances than the use of context window. Furthermore, using both the description and context window together delivers even better results. The evaluation results also indicate that our reranking method is eective for improving the ranking performances.

[1]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Petr Sojka Exploiting semantic annotations in math information retrieval , 2012, ESAIR '12.

[4]  Michael Kohlhase,et al.  MathWebSearch at NTCIR-10 , 2013, NTCIR.

[5]  Petr Sojka,et al.  Indexing and Searching Mathematics in Digital Libraries - Architecture, Design and Scalability Issues , 2011, Calculemus/MKM.

[6]  Muhammad Adeel,et al.  MATH GO! PROTOTYPE OF A CONTENT BASED MATHEMATICAL FORMULA SEARCH ENGINE , 2008 .

[7]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[8]  Mihai Grigore,et al.  Towards context-based disambiguation of mathematical expressions , 2009 .

[9]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[10]  Giovanni Yoko Kristianto,et al.  Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers , 2014, D Lib Mag..

[11]  Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-11, National Center of Sciences, Tokyo, Japan, December 9-12, 2014 , 2014, NTCIR.

[12]  Rajesh Munavalli,et al.  MathFind: a math-aware search engine , 2006, SIGIR '06.

[13]  Fredric C. Gey,et al.  The Abject Failure of Keyword IR for Mathematics Search: Berkeley at NTCIR-10 Math , 2013, NTCIR.

[14]  Michael Kohlhase,et al.  Using Discourse Context to Interpret Object-Denoting Mathematical Expressions , 2011 .

[15]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[16]  Hiroaki Saito,et al.  Partial-match Retrieval with Structure-reflected Indices at the NTCIR-10 Math Task , 2013, NTCIR.

[17]  Dominic Battré,et al.  Nephele/PACTs: a programming model and execution framework for web-scale analytical processing , 2010, SoCC '10.

[18]  Michael Kohlhase,et al.  A Search Engine for Mathematical Formulae , 2006, AISC.

[19]  Volker Markl,et al.  Querying Large Collections of Mathematical Publications: NTCIR10 Math Task , 2013, NTCIR.

[20]  Petr Sojka,et al.  Evaluation of Mathematics Retrieval , 2013 .

[21]  Iadh Ounis,et al.  NTCIR-10 Math Pilot Task Overview , 2013, NTCIR.

[22]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[23]  Akiko Aizawa,et al.  An Approach to Similarity Search for Mathematical Expressions using MathML , 2009 .

[24]  Zeev Dvir,et al.  On the size of Kakeya sets in finite fields , 2008, 0803.2336.

[25]  Minh-Quoc Nghiem,et al.  Extracting Definitions of Mathematical Expressions in Scientific Papers (人工知能学会全国大会(第26回)文化,科学技術と未来) -- (International Organized Session「Alan Turing Year Special Session on AI Research That Can Change The World」) , 2012 .

[26]  Peter Graf,et al.  Term Indexing , 1996, Lecture Notes in Computer Science.

[27]  Minh-Quoc Nghiem,et al.  Contextual Analysis of Mathematical Expressions for Advanced Mathematical Search , 2011, Polibits.

[28]  Michael Kohlhase,et al.  MathWebSearch 0.5: Scaling an Open Formula Search Engine , 2012, AISC/MKM/Calculemus.