Soft-constrained inference for Named Entity Recognition

Much of the valuable information in supporting decision making processes originates in text-based documents. Although these documents can be effectively searched and ranked by modern search engines, actionable knowledge need to be extracted and transformed in a structured form before being used in a decision process. In this paper we describe how the discovery of semantic information embedded in natural language documents can be viewed as an optimization problem aimed at assigning a sequence of labels (hidden states) to a set of interdependent variables (textual tokens). Dependencies among variables are efficiently modeled through Conditional Random Fields, an indirected graphical model able to represent the distribution of labels given a set of observations. The Markov property of these models prevent them to take into account long-range dependencies among variables, which are indeed relevant in Natural Language Processing. In order to overcome this limitation we propose an inference method based on Integer Programming formulation of the problem, where long distance dependencies are included through non-deterministic soft constraints.

[1]  Galen Andrew,et al.  A Hybrid Markov/Semi-Markov Conditional Random Field for Sequence Segmentation , 2006, EMNLP.

[2]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[3]  Dan Klein,et al.  Unsupervised Learning of Field Segmentation Models for Information Extraction , 2005, ACL.

[4]  Peter Clifford,et al.  Markov Random Fields in Statistics , 2012 .

[5]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[6]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[7]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[8]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[9]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[10]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[11]  Ganesh Ramakrishnan,et al.  RAD: A Scalable Framework for Annotator Development , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Maksim Tkatchenko,et al.  Selecting features for domain-independent named entity recognition , 2012, KONVENS.

[13]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[14]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[15]  L. F. Rau,et al.  Extracting company names from text , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[16]  Paul A. Viola,et al.  Corrective feedback and persistent learning for information extraction , 2006, Artif. Intell..

[17]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[18]  Changki Lee,et al.  Fine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering , 2006, AIRS.

[19]  Luc De Raedt,et al.  Integrating Naïve Bayes and FOIL , 2007, J. Mach. Learn. Res..

[20]  Klaus Truemper,et al.  Design of logic-based intelligent systems , 2004 .

[21]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[22]  Francesco Archetti,et al.  Semantics and Machine Learning: A New Generation of Court Management Systems , 2010, IC3K.

[23]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[24]  Keith Marsolo,et al.  Large-scale evaluation of automated clinical note de-identification and its impact on information extraction , 2013, J. Am. Medical Informatics Assoc..

[25]  Nigel Collier,et al.  Use of Support Vector Machines in Extended Named Entity Recognition , 2002, CoNLL.

[26]  Douglas E. Appelt,et al.  FASTUS: A Finite-state Processor for Information Extraction from Real-world Text , 1993, IJCAI.

[27]  Ana L. N. Fred,et al.  Knowledge Discovery, Knowledge Engineering and Knowledge Management , 2014, Communications in Computer and Information Science.

[28]  Andrew McCallum,et al.  Efficient training methods for conditional random fields , 2008 .

[29]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[30]  Yu. A. Zuev Representations of Boolean functions by systems of linear inequalities , 1985 .

[31]  Elisabetta Fersini,et al.  Named Entities in Judicial Transcriptions: Extended Conditional Random Fields , 2013, CICLing.

[32]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[33]  Claire Cardie,et al.  UMass/Hughes: Description of the CIRCUS System Used for MUC-51 , 1993, MUC.

[34]  Samaneh Moghaddam,et al.  Fine-Grained Opinion Mining Using Conditional Random Fields , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[35]  Li Chen,et al.  A Linear-Chain CRF-Based Learning Approach for Web Opinion Mining , 2010, WISE.

[36]  Jun Zhang,et al.  Automated search for patient records: classification of free-text medical reports using conditional random fields , 2012, IHI '12.

[37]  Tu Bao Ho,et al.  Chance discovery and learning minority classes , 2003, New Generation Computing.

[38]  Leonardo A. Martucci,et al.  Interactive access rule learning : Generating adapted access rule sets , 2010 .

[39]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[40]  Trevor Cohn,et al.  Scaling Conditional Random Fields Using Error-Correcting Codes , 2005, ACL.

[41]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[42]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[43]  Klaus Truemper,et al.  A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[44]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[45]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[46]  Cong Yu,et al.  Purple SOX extraction management system , 2009, SGMD.

[47]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[48]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[49]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[50]  Klaus Truemper,et al.  Lsquare System for Mining Logic Data , 2005 .