Modeling Commonality among Related Classes in Relation Extraction

This paper proposes a novel hierarchical learning strategy to deal with the data sparseness problem in relation extraction by modeling the commonality among related classes. For each class in the hierarchy either manually predefined or automatically clustered, a linear discriminative function is determined in a top-down way using a perceptron algorithm with the lower-level weight vector derived from the upper-level weight vector. As the upper-level class normally has much more positive training examples than the lower-level class, the corresponding linear discriminative function can be determined more reliably. The upper-level discriminative function then can effectively guide the discriminative function learning in the lower-level, which otherwise might suffer from limited training data. Evaluation on the ACE RDC 2003 corpus shows that the hierarchical strategy much improves the performance by 5.6 and 5.1 in F-measure on least- and medium- frequent relations respectively. It also shows that our system outperforms the previous best-reported system by 2.7 in F-measure on the 24 subtypes using the same feature set.

[1]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[2]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[3]  Dan Roth,et al.  Probabilistic Reasoning for Entity & Relation Recognition , 2002, COLING.

[4]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[5]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[6]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[7]  Jian Su,et al.  Discovering Relations Between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering , 2005, IJCNLP.

[8]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[9]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[10]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[11]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Scott Miller,et al.  A Novel Use of Statistical Parsing to Extract Information from Text , 2000, ANLP.

[14]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.