论文信息 - Unsupervised Relation Extraction by Massive Clustering

Unsupervised Relation Extraction by Massive Clustering

The goal of Information Extraction is to automatically generate structured pieces of information from the relevant information contained in text documents. Machine Learning techniques have been applied to reduce the cost of Information Extraction system adaptation. However, elements of human supervision strongly bias the learning process. Unsupervised learning approaches can avoid these biases. In this paper, we propose an unsupervised approach to learning for Relation Detection, based on the use of massive clustering ensembles. The results obtained on the ACE Relation Mention Detection task outperform in terms of F1 score by 5 points the state of the art of unsupervised techniques for this evaluation framework, in addition to being simpler and more flexible.

Jordi Turmo | Edgar González

[1] Ossama Emam,et al. Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement , 2006, EMNLP.

[2] Ralph Grishman,et al. Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[3] Alicia Ageno,et al. Adaptive information extraction , 2006, CSUR.

[4] Jordi Turmo,et al. Comparing Non-parametric Ensemble Methods for Document Clustering , 2008, NLDB.

[5] Satoshi Sekine,et al. Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[6] Mihai Surdeanu,et al. Robust Information Extraction with Perceptrons , 2007 .

[7] Sergey Brin,et al. Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[8] Satoshi Sekine,et al. On-Demand Information Extraction , 2006, ACL.

[9] Ralph Grishman,et al. Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[10] Luis Gravano,et al. Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[11] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[12] Nanda Kambhatla,et al. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.