Layout and Semantics: Combining Representations for Mathematical Formula Search

Math-aware search engines need to support formulae in queries. Mathematical expressions are typically represented as trees defining their operational semantics or visual layout. We propose searching both formula representations using a three-layer model. The first layer selects candidates using spectral matching over tree node pairs. The second layer aligns a query with candidates and computes similarity scores based on structural matching. In the third layer, similarity scores are combined using linear regression. The two representations are combined using retrieval in parallel indices and regression over similarity scores. For NTCIR-12 Wikipedia Formula Browsing task relevance rankings, we see each layer increasing ranking quality and improved results when combining representations as measured by Bpref and nDCG scores.

[1]  Zhi Tang,et al.  A mathematics retrieval system for formulae in layout presentations , 2014, SIGIR.

[2]  Hui Fang,et al.  OPMES: A Similarity Search Engine for Mathematical Content , 2016, ECIR.

[3]  Giovanni Yoko Kristianto,et al.  Efficient Algorithm for Math Formula Semantic Search , 2016, IEICE Trans. Inf. Syst..

[4]  Frank Wm. Tompa,et al.  Multi-Stage Math Formula Search: Using Appearance-Based Similarity Metrics at Scale , 2016, SIGIR.

[5]  Iadh Ounis,et al.  NTCIR-12 MathIR Task Overview , 2016, NTCIR.

[6]  Michael Kohlhase,et al.  MathWebSearch at NTCIR-11 , 2014, NTCIR.

[7]  Bruce R. Miller,et al.  Technical Aspects of the Digital Library of Mathematical Functions , 2003, Annals of Mathematics and Artificial Intelligence.

[8]  Abhishek Gupta,et al.  A Document Retrieval System for Math Queries , 2016, NTCIR.

[9]  Claudio Sacerdoti Coen,et al.  A Survey on Retrieval of Mathematical Knowledge , 2015, Mathematics in Computer Science.

[10]  Giovanni Yoko Kristianto,et al.  MCAT Math Retrieval System for NTCIR-12 MathIR Task , 2016, NTCIR.

[11]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[12]  Petr Sojka,et al.  Math Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies , 2016, NTCIR.

[13]  Richard Zanibbi,et al.  Recognition and retrieval of mathematical expressions , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[14]  Frank Wm. Tompa,et al.  Tangent-3 at the NTCIR-12 MathIR Task , 2016, NTCIR.

[15]  Frank Wm. Tompa,et al.  Structural Similarity Search for Mathematics Retrieval , 2013, MKM/Calculemus/DML.

[16]  Yuehan Wang,et al.  A mathematical information retrieval system based on RankBoost , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[17]  Yuehan Wang,et al.  The Math Retrieval System of ICST for NTCIR-12 MathIR Task , 2016, NTCIR.

[18]  Hiroaki Saito,et al.  Partial-match Retrieval with Structure-reflected Indices at the NTCIR-10 Math Task , 2013, NTCIR.