Syntactic role identification of mathematical expressions

This paper presents a prediction algorithm to infer the syntactic role (SR) of mathematical expressions (ME), or SRme, in ME-plaintext mixed sentences. SRme is a predicted syntax label of ME, which could be integrated into any constituent parser to improve their accuracy in sentence parsing. SRME is based upon three features of ME placement in a sentence: properness of Sentence structure (feature F3), properties of ME (feature F2), and PoS of the Local neighbor plain text (feature F1). An inside-outside inspired algorithm is proposed for SRME by maximizing the probability of a relaxed parsing tree. Features in F2 was found to fit into both exponential and Poisson distributions, which could fuse with other features to re-weight the prediction rule that improves the prediction precision for SRme as a noun phrase (noun modifier) by 3.6% (18.7%). F1, F2, and F3 were found to complement each other. Significant discriminative patterns on the part-of-speech (PoS) of the neighbor plaintext are adopted to build a Naïve Bayesian classifier, which is fused with the F3 baseline that improved the precision of the prediction of SRme as a sentence by 10%. The overall error rate of the SRME prediction algorithm was found to be 15.1% based on an experiment using a public ME-plaintext mixed parsing tree data set provided by Elsevier.

[1]  Minh-Quoc Nghiem,et al.  Extracting Definitions of Mathematical Expressions in Scientific Papers (人工知能学会全国大会(第26回)文化,科学技術と未来) -- (International Organized Session「Alan Turing Year Special Session on AI Research That Can Change The World」) , 2012 .

[2]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[3]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[4]  Minh-Quoc Nghiem,et al.  Contextual Analysis of Mathematical Expressions for Advanced Mathematical Search , 2011, Polibits.

[5]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[6]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[7]  Karl Stratos,et al.  The Inside-Outside Algorithm , 2012 .

[8]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[9]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10]  Paul R Cohen,et al.  DARPA's Big Mechanism program , 2015, Physical biology.

[11]  Ivana Kruijff-Korbayová,et al.  Analysis of Mixed Natural and Symbolic Input in Mathematical Dialogs , 2004, ACL.

[12]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[13]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[14]  Mohan Ganesalingam The Language of Mathematics , 2013 .

[15]  Moritz Schubotz,et al.  Mathematical Language Processing Project , 2014, CICM Workshops.

[16]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[17]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[18]  Volker Markl,et al.  Semantification of Identifiers in Mathematics for Better Math Information Retrieval , 2016, SIGIR.

[19]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.