Partial-match Retrieval with Structure-reflected Indices at the NTCIR-10 Math Task

To attain fast and accurate response in math formulae search, an index should be prepared which holds structure information of math expressions; a different indexing for full text search. Although some previous research has been done by this approach, the size of indices tends to become huge on memory. This paper proposes a partial match retrieval system for math formulae with two kinds of indices. The first one is an inverted index constructed from paths to the root node from each node seeing formula as an expression tree. The other index is a table which stores the parent node and the text string for each node in the expression trees. A hundred thousand documents in the NTCIR-10 Math Task (formula search) containing 36 million math formulae were used for evaluation. The number of nodes was about 291 million and the number of path kinds in the inverted index was about 9 million. Experimental results showed that the search time grows linearly to the number of retrieved documents. Concretely, the search time ranges from 10 milliseconds to 1.2 seconds; the simpler formulae tend to need more search time.