QUALIBETA at the NTCIR-11 Math 2 Task: An Attempt to Query Math Collections

This project introduces our first attempt to mathematical retrieval of formulae from a large collection for the NTCIR-11 Math 2 task. Our approach combined a feature-extracted sequence mechanism of the formulae and a sentence level representation of the text describing the formulae to model the collection. The feature-extracted sequences used were: the category of the formulae, the sets of identifiers, constants, and operators. This representation with the text surrounding the formulae were indexed in Elastic Search for query processing. Even though our information extraction model results are below the average’s participants and our expectations, the experience will help us to improve our work in several directions.