In natural language, words and phrases themselves imply the semantics. In contrast, the meaning of identifiers in mathematical formulae is undefined. Thus scientists must study the context to decode the meaning. The Mathematical Language Processing (MLP) project aims to support that process. In this paper, we compare two approaches to discover identifier-definition tuples. At first we use a simple pattern matching approach. Second, we present the MLP approach that uses part-of-speech tag based distances as well as sentence positions to calculate identifier-definition probabilities. The evaluation of our prototypical system, applied on the Wikipedia text corpus, shows that our approach augments the user experience substantially. While hovering the identifiers in the formula, tool-tips with the most probable definitions occur. Tests with random samples show that the displayed definitions provide a good match with the actual meaning of the identifiers.
[1]
Dominic Battré,et al.
Massively parallel data analysis with PACTs on Nephele
,
2010,
Proc. VLDB Endow..
[2]
Felix Naumann,et al.
The Stratosphere platform for big data analytics
,
2014,
The VLDB Journal.
[3]
Minh-Quoc Nghiem,et al.
Contextual Analysis of Mathematical Expressions for Advanced Mathematical Search
,
2011,
Polibits.
[4]
Fairouz Kamareddine,et al.
Computerizing Mathematical Text with MathLang
,
2008,
LSFA.
[5]
Akiko Aizawa,et al.
Mining Coreference Relations between Formulas and Text using Wikipedia
,
2010
.
[6]
Iadh Ounis,et al.
NTCIR-10 Math Pilot Task Overview
,
2013,
NTCIR.
[7]
Adwait Ratnaparkhi,et al.
A Maximum Entropy Model for Part-Of-Speech Tagging
,
1996,
EMNLP.
[8]
Michael McGill,et al.
Introduction to Modern Information Retrieval
,
1983
.
[9]
Mohan Ganesalingam.
The Language of Mathematics
,
2013
.