Using Discourse Context to Interpret Object-Denoting Mathematical Expressions

We present a method for determining the context-dependent denotation of simple object-denoting mathematical expressions in mathematical documents. Our approach relies on estimating the similarity between the linguistic context within which the given expression occurs and a set of terms from a flat domain taxonomy of mathematical concepts; one of 7 head concepts dominating a set of terms with highest similarity score to the symbol’s context is assigned as the symbol’s interpretation. The taxonomy we used was constructed semi-automatically by combining structural and lexical information from the Cambridge Mathematics Thesaurus and the Mathematics Subject Classification. The context information taken into account in the statistical similarity calculation includes lexical features of the discourse immediately adjacent to the given expression as well as global discourse. In particular, as part of the latter we include the lexical context of structurally similar expressions throughout the document and that of the symbol’s declaration statement if one can be found in the document. Our approach has been evaluated on a gold standard manually annotated by experts, achieving 66% precision.

[1]  Ellen Riloff,et al.  Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs , 2008, ACL.

[2]  Magdalena Wolska,et al.  Symbol Declarations in Mathematical Writing , 2010 .

[3]  Diana McCarthy Word Sense Disambiguation: An Overview , 2009, Lang. Linguistics Compass.

[4]  Peter Pirolli,et al.  Modeling Information Scent: A Comparison of LSA, PMI and GLSA Similarity Measures on Common Tests and Corpora , 2007, RIAO.

[5]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[6]  Gregory R. Olsen,et al.  An Ontology for Engineering Mathematics , 1994, KR.

[7]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[8]  Ted Pedersen,et al.  Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[9]  Stephen M. Watt,et al.  Mathematical Markup Language (MathML) Version 3.0 , 2001, WWW 2001.

[10]  Bruce R. Miller,et al.  Transforming Large Collections of Scientific Publications to XML , 2010, Math. Comput. Sci..

[11]  Mihai Grigore,et al.  Towards context-based disambiguation of mathematical expressions , 2009 .

[12]  Markus Wessler An algebraic proof of Iitaka‚s conjecture C2, 1 , 2001 .

[13]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[14]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.