Utilizing dependency relationships between math expressions in math IR

Current mathematical search systems allow math expressions within a document to be queried using math expressions and keywords. To accept such queries, math search systems must index both math expressions and textual information in documents. Each indexed math expression is usually associated with all the words in its surrounding context within a given window size. However, we found that this context is often ineffective for explaining math expressions in scientific papers. The meaning of an expression is usually defined in the early part of a document, and the meaning of each symbol contained in the expression can be useful for explaining the entire expression. This explanation may not be captured within the context of a math expression, unless we set the context to have a very wide window size. However, widening the window size also increases the proportion of words that are unrelated to the expression. This paper proposes the use of dependency relationships between math expressions to enrich the textual information of each expression. We examine the influence of this enrichment in a math search system. The experimental results show that significantly better precision can be obtained using the enriched textual information rather than the math expressions’ own textual information. This indicates that the enrichment of textual information for each math expression using dependency relationships enhances the math search system.

[1]  Dong-Hong Ji,et al.  Semi-supervised Relation Extraction with Label Propagation , 2006, HLT-NAACL.

[2]  Michael Kohlhase,et al.  A Search Engine for Mathematical Formulae , 2006, AISC.

[3]  Peter Graf,et al.  Term Indexing , 1996, Lecture Notes in Computer Science.

[4]  Petr Sojka,et al.  Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy , 2014, NTCIR.

[5]  Jin Zhao,et al.  Math information retrieval: user requirements and prototype implementation , 2008, JCDL '08.

[6]  Abdou Youssef,et al.  Roles of Math Search in Mathematics , 2006, MKM.

[7]  Petr Sojka,et al.  Evaluation of Mathematics Retrieval , 2013 .

[8]  Richard Zanibbi,et al.  Layout-based substitution tree indexing and retrieval for mathematical expressions , 2012, Electronic Imaging.

[9]  Mihai Grigore,et al.  Towards context-based disambiguation of mathematical expressions , 2009 .

[10]  Richard Zanibbi,et al.  Combining TF-IDF Text Retrieval with an Inverted Index over Symbol Pairs in Math Expressions: The Tangent Math Search Engine at NTCIR 2014 , 2014, NTCIR.

[11]  Claudio Sacerdoti Coen,et al.  A Survey on Retrieval of Mathematical Knowledge , 2016, Math. Comput. Sci..

[12]  Giovanni Yoko Kristianto,et al.  Exploiting textual descriptions and dependency graph for searching mathematical expressions in scientific papers , 2014, Ninth International Conference on Digital Information Management (ICDIM 2014).

[13]  Michael Kohlhase,et al.  MathWebSearch 0.5: Scaling an Open Formula Search Engine , 2012, AISC/MKM/Calculemus.

[14]  Patrick Pantel,et al.  Word-for-Word Glossing with Contextually Similar Words , 2000, ANLP.

[15]  Wolf-Tilo Balke,et al.  QUALIBETA at the NTCIR-11 Math 2 Task: An Attempt to Query Math Collections , 2014, NTCIR.

[16]  Minh-Quoc Nghiem,et al.  Contextual Analysis of Mathematical Expressions for Advanced Mathematical Search , 2011, Polibits.

[17]  Chris Callison-Burch,et al.  Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora , 2004, ACL.

[18]  Petr Sojka Exploiting semantic annotations in math information retrieval , 2012, ESAIR '12.

[19]  Abdou Youssef,et al.  Search of Mathematical Contents: Issues And Methods , 2005, IASSE.

[20]  Yuehan Wang,et al.  ICST Math Retrieval System for NTCIR-11 Math-2 Task , 2014, NTCIR.

[21]  Giovanni Yoko Kristianto,et al.  Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers , 2014, D Lib Mag..

[22]  Volker Markl,et al.  Evaluation of Similarity-Measure Factors for Formulae Based on the NTCIR-11 Math Task , 2014, NTCIR.

[23]  Rajesh Munavalli,et al.  MathFind: a math-aware search engine , 2006, SIGIR '06.

[24]  Siu Cheung Hui,et al.  A lattice-based approach for mathematical search using Formal Concept Analysis , 2012, Expert Syst. Appl..

[25]  Michael Kohlhase,et al.  Re examining the MKM Value Proposition: From Math Web Search to Math Web Re Search , 2007, Calculemus/MKM.

[26]  Richard Zanibbi,et al.  Recognition and retrieval of mathematical expressions , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[27]  Fredric C. Gey,et al.  The Abject Failure of Keyword IR for Mathematics Search: Berkeley at NTCIR-10 Math , 2013, NTCIR.

[28]  Minh-Quoc Nghiem,et al.  Annotating scientific papers for mathematical formula search , 2012, ESAIR '12.

[29]  Hideki Hashimoto,et al.  An Investigation of Index Formats for the Search of MathML Objects , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[30]  Zhi Tang,et al.  A mathematics retrieval system for formulae in layout presentations , 2014, SIGIR.

[31]  Michael Kohlhase,et al.  Using Discourse Context to Interpret Object-Denoting Mathematical Expressions , 2011 .

[32]  Abdou Youssef Advances in math search , 2007 .

[33]  Iadh Ounis,et al.  NTCIR-11 Math-2 Task Overview , 2014, NTCIR.

[34]  Petr Sojka,et al.  Indexing and Searching Mathematics in Digital Libraries - Architecture, Design and Scalability Issues , 2011, Calculemus/MKM.

[35]  Muhammad Adeel,et al.  MATH GO! PROTOTYPE OF A CONTENT BASED MATHEMATICAL FORMULA SEARCH ENGINE , 2008 .

[36]  Yuehan Wang,et al.  WikiMirs 3.0: A Hybrid MIR System Based on the Context, Structure and Importance of Formulae in a Document , 2015, JCDL.

[37]  Volker Markl,et al.  Querying Large Collections of Mathematical Publications: NTCIR10 Math Task , 2013, NTCIR.

[38]  Volker Markl,et al.  Challenges of Mathematical Information Retrievalin the NTCIR-11 Math Wikipedia Task , 2015, SIGIR.

[39]  Wolf-Tilo Balke,et al.  Demystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach , 2015, JCDL.

[40]  Allan Hanbury,et al.  TUW-IMP at the NTCIR-11 Math-2 , 2014, NTCIR.

[41]  Petr Sojka,et al.  The art of mathematics retrieval , 2011, DocEng '11.

[42]  Akiko Aizawa,et al.  Mining Coreference Relations between Formulas and Text using Wikipedia , 2010 .

[43]  Giovanni Yoko Kristianto,et al.  MCAT Math Retrieval System for NTCIR-12 MathIR Task , 2016, NTCIR.

[44]  Simone Teufel,et al.  Retrieval of Research-level Mathematical Information Needs: A Test Collection and Technical Terminology Experiment , 2015, ACL.

[45]  Rajesh Munavalli,et al.  An Approach to Mathematical Search Through Query Formulation and Data Normalization , 2007, Calculemus/MKM.

[46]  Iadh Ounis,et al.  NTCIR-12 MathIR Task Overview , 2016, NTCIR.

[47]  Michael Kohlhase,et al.  MathWebSearch at NTCIR-11 , 2014, NTCIR.

[48]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[49]  Siu Cheung Hui,et al.  A math-aware search engine for math question answering system , 2012, CIKM '12.

[50]  Paul Libbrecht,et al.  Methods to Access and Retrieve Mathematical Content in ActiveMath , 2006, ICMS.

[51]  Magdalena Wolska,et al.  Symbol Declarations in Mathematical Writing , 2010 .

[52]  Zhi Tang,et al.  WikiMirs: a mathematical information retrieval system for wikipedia , 2013, JCDL '13.

[53]  Tetsuya Sakai,et al.  On the Robustness of Information Retrieval Metrics to Biased Relevance Assessments , 2009, J. Inf. Process..

[54]  Patrick Pantel,et al.  Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[55]  Frank Wm. Tompa,et al.  Structural Similarity Search for Mathematics Retrieval , 2013, MKM/Calculemus/DML.

[56]  Iadh Ounis,et al.  NTCIR-10 Math Pilot Task Overview , 2013, NTCIR.

[57]  Nikolai I. Chernov,et al.  Least squares fitting of circles and lines , 2003, ArXiv.

[58]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[59]  Abdou Youssef,et al.  Methods of Relevance Ranking and Hit-content Generation in Math Search , 2007, Calculemus/MKM.

[60]  Qun Zhang,et al.  An Approach to Math-Similarity Search , 2014, CICM.

[61]  Akiko Aizawa,et al.  An Approach to Similarity Search for Mathematical Expressions using MathML , 2009 .

[62]  Minh-Quoc Nghiem,et al.  The MCAT Math Retrieval System for NTCIR-11 Math Track , 2014, NTCIR.

[63]  Bruce R. Miller,et al.  Technical Aspects of the Digital Library of Mathematical Functions , 2003, Annals of Mathematics and Artificial Intelligence.

[64]  Frank Wm. Tompa,et al.  Retrieving documents with mathematical content , 2013, SIGIR.

[65]  Andrea Asperti,et al.  Efficient Retrieval of Mathematical Statements , 2004, MKM.