Informal Mathematical Discourse Parsing with Conditional Random Fields

Discourse parsing for the Informal Mathematical Discourse (IMD) has been a difficult task because of the lack of data sets, partly because the Natural Language Processing (NLP) techniques must be adapted to informality of IMD. In this paper, we present an end-to-end discourse parser which is a sequential classifier of informal deductive argumentations (IDA) for Spanish. We design a discourse parser using sequence labeling based on CRFs (Conditional Random Fields). We use the CRFs on lexical, syntactic and semantic features extracted from a discursive corpus (MD-TreeBank: Mathematical Discourse TreeBank). In this article, we describe a Penn Discourse TreeBank (PDTB) styled End-to-End discourse parser into the Control Natural Languages (CNLs) context. Discourse parsing is focused from a discourse low level perspective in which we identify the IDA connectives avoiding complex linguistic phenomena. Our discourse parser performs parsing as a connective-level sequence labeling task and classifies several types of informal deductive argumentations into the mathematical proof.

[1]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[2]  Torsten Suel,et al.  Web Information Systems Engineering - WISE 2010 - 11th International Conference, Hong Kong, China, December 12-14, 2010. Proceedings , 2010, WISE.

[3]  Ani Nenkova,et al.  Using Syntax to Disambiguate Explicit Discourse Connectives in Text , 2009, ACL.

[4]  Raúl Ernesto Gutiérrez de Piñerez Reyes,et al.  Building a Discourse Parser for Informal Mathematical Discourse in the Context of a Controlled Natural Language , 2013, CICLing.

[5]  James Pustejovsky,et al.  Automatically Identifying the Arguments of Discourse Connectives , 2007, EMNLP.

[6]  Christophe Raffalli,et al.  MathAbs: a representational language for mathematics , 2010, FIT.

[7]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[8]  Li Chen,et al.  A Linear-Chain CRF-Based Learning Approach for Web Opinion Mining , 2010, WISE.

[9]  Richard Johansson,et al.  Shallow Discourse Parsing with Conditional Random Fields , 2011, IJCNLP.

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Fairouz Kamareddine,et al.  Narrative Structure of Mathematical Texts , 2007, Calculemus/MKM.

[12]  Hwee Tou Ng,et al.  A PDTB-styled end-to-end discourse parser , 2012, Natural Language Engineering.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  James Pustejovsky,et al.  Sequence models and ranking methods for discourse parsing , 2009 .

[15]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[16]  Alan Lee,et al.  Attribution and the (Non-)Alignment of Syntactic and Discourse Arguments of Connectives , 2005, FCA@ACL.

[17]  Christoph Lüth,et al.  A Framework for Interactive Proof , 2007, Calculemus/MKM.

[18]  Magdalena Wolska,et al.  A Language Engineering Architecture for Processing Informal Mathematical Discourse , 2008 .

[19]  Raúl Ernesto Gutiérrez de Piñerez Reyes,et al.  Preprocessing of informal mathematical discourse in context ofcontrolled natural language , 2012, CIKM '12.