A Mathematical Indexing Method Based on the Hierarchical Features of Operators in Formulae

Full text search engines widely used today still have no math searching function, which brings inconvenience for people finding their scientific documents with mathematical query words. It is necessary to research and develop the theory and technology of mathematical expression retrieval. This paper proposed an index model of mathematical expressions for realizing math retrieval through analyzing the characteristics of formulae. Firstly, the FDS data was obtained from the formulae expressed in LaTeX description with recursive analysis. Then, the index features including the level and location features of operators were extracted from the FDS data of formulae. Finally, the extracted features were used to construct a feature vector for dividing formulae into several classes and the math index was constructed for the classes respectively. The experiment was carried out on 134199 formulae and the result shows its effectiveness for improving the efficiency of mathematical expression retrieval. Keywords—mathematical expression retrieval; index; hierarchical features; operators

[1]  Bruce R. Miller Three Years of DLMF: Web, Math and Search , 2013, MKM/Calculemus/DML.

[2]  Richard Zanibbi,et al.  Keyword and image-based retrieval of mathematical expressions , 2011, Electronic Imaging.

[3]  Wei Su,et al.  Functional classification study for mathematical formulas retrieval , 2016, 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[4]  Rajesh Munavalli,et al.  An Approach to Mathematical Search Through Query Formulation and Data Normalization , 2007, Calculemus/MKM.

[5]  Paul Libbrecht,et al.  Methods to Access and Retrieve Mathematical Content in ActiveMath , 2006, ICMS.

[6]  Zhi Tang,et al.  WikiMirs: a mathematical information retrieval system for wikipedia , 2013, JCDL '13.

[7]  Zhi Tang,et al.  A mathematics retrieval system for formulae in layout presentations , 2014, SIGIR.

[8]  Michael Kohlhase,et al.  MathWebSearch at NTCIR-11 , 2014, NTCIR.

[9]  Fang Yang,et al.  An indexing method of mathematical expression retrieval , 2013, Proceedings of 2013 3rd International Conference on Computer Science and Network Technology.

[10]  Richard Zanibbi,et al.  Layout-based substitution tree indexing and retrieval for mathematical expressions , 2012, Electronic Imaging.

[11]  Leo Galambos,et al.  System Description: EgoMath2 As a Tool for Mathematical Searching on Wikipedia.org , 2011, Calculemus/MKM.