论文信息 - Learning for Collective Information Extraction

Learning for Collective Information Extraction

An Information Extraction (IE) system analyses a set of documents with the aim of identifying certain types of entities and relations between them. Most IE systems treat separate potential extractions as independent. However, in many cases, considering influence s between different candidate extractions could improve overall accuracy. For example, phrase repetitions inside a document are usually associated with the same entity type, the same being true for acronyms and their corresponding long form. One of our goals in this thesis is to show how these and potentially other types of correlations can be captured by a particular type of undirected probabilistic graphical mo del. Inference and learning using this graphical model allows for "collective information extraction" in a way that exploits the mutual influence between possible extractions. Preliminary experiments on learnin g to extract named entities from biomedical and newspaper text demonstrate the advantages of our approach. The benefit of doing collective classification comes however at a cost: in the general case, exact infer- ence in the resulting graphical model has an exponential time complexity. The standard solution, which is also the one that we used in our initial work, is to resort to approximate inference. In this proposal we show that by considering only a selected subset of mutual influences between candidate extractions, exact inference can be done in linear time. Consequently, a short term goal is to run comparative exper- iments that would help us choose between the two approaches: exact inference with a restricted subset of mutual influences or approximate inference with the full s et of influences. The set of issues that we intend to investigate in future work is two fold. One direction refers to applying the already developed framework to other natural language tasks that may benefit from the same types of influences, such as word sense disambiguation and pa rt-of-speech tagging. Another direction concerns the design of a sufficiently general framework that would allow a seamless integration of cues from a variety of knowledge sources. We contemplate using generic sources such as external dictionaries, or web statistics on discriminative textual patterns. We al so intend to alleviate the modeling problems due to the intrinsic local nature of entity features by explo iting syntactic information. All these generic features will be input to a feature selection algorithm, so t hat in the end we obtain a model which is both compact and accurate.

Razvan Bunescu

[1] Steffen L. Lauritzen,et al. Bayesian updating in causal probabilistic networks by local computations , 1990 .

[2] J. Ross Quinlan,et al. Learning logical definitions from relations , 1990, Machine Learning.

[3] Razvan C. Bunescu,et al. Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[4] Eleanor Rosch,et al. Principles of Categorization , 1978 .

[5] Marti A. Hearst,et al. A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[6] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[7] Adwait Ratnaparkhi,et al. A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[8] Roni Rosenfeld,et al. Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[9] David J. Spiegelhalter,et al. Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[10] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[11] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.