Learning for Collective Information Extraction

An Information Extraction (IE) system analyses a set of documents with the aim of identifying certain types of entities and relations between them. Most IE systems treat separate potential extractions as independent. However, in many cases, considering influence s between different candidate extractions could improve overall accuracy. For example, phrase repetitions inside a document are usually associated with the same entity type, the same being true for acronyms and their corresponding long form. One of our goals in this thesis is to show how these and potentially other types of correlations can be captured by a particular type of undirected probabilistic graphical mo del. Inference and learning using this graphical model allows for "collective information extraction" in a way that exploits the mutual influence between possible extractions. Preliminary experiments on learnin g to extract named entities from biomedical and newspaper text demonstrate the advantages of our approach. The benefit of doing collective classification comes however at a cost: in the general case, exact infer- ence in the resulting graphical model has an exponential time complexity. The standard solution, which is also the one that we used in our initial work, is to resort to approximate inference. In this proposal we show that by considering only a selected subset of mutual influences between candidate extractions, exact inference can be done in linear time. Consequently, a short term goal is to run comparative exper- iments that would help us choose between the two approaches: exact inference with a restricted subset of mutual influences or approximate inference with the full s et of influences. The set of issues that we intend to investigate in future work is two fold. One direction refers to applying the already developed framework to other natural language tasks that may benefit from the same types of influences, such as word sense disambiguation and pa rt-of-speech tagging. Another direction concerns the design of a sufficiently general framework that would allow a seamless integration of cues from a variety of knowledge sources. We contemplate using generic sources such as external dictionaries, or web statistics on discriminative textual patterns. We al so intend to alleviate the modeling problems due to the intrinsic local nature of entity features by explo iting syntactic information. All these generic features will be input to a feature selection algorithm, so t hat in the end we obtain a model which is both compact and accurate.

[1]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[2]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[3]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[4]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[5]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[6]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[7]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[8]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[9]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[10]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[11]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[12]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[17]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[18]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Claire Cardie,et al.  Empirical Methods in Information Extraction , 1997, AI Mag..

[21]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[22]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[23]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[24]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[25]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[26]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[28]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[29]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[30]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Philip Resnik,et al.  THE LINGUIST'S SEARCH ENGINE: GETTING STARTED GUIDE , 2003 .

[33]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[34]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[35]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[36]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[37]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[38]  Mark Craven,et al.  Representing Sentence Structure in Hidden Markov Models for Information Extraction , 2001, IJCAI.

[39]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[40]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[41]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[42]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[43]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[44]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[45]  Nigel Collier,et al.  The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers , 1999, EACL.

[46]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[47]  Padhraic Smyth,et al.  Segmental semi-markov models and applications to sequence analysis , 2002 .

[48]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[49]  Razvan C. Bunescu,et al.  Associative Anaphora Resolution: A Web-Based Approach , 2003 .

[50]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[51]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.