Convolution Neural Network for Relation Extraction

Deep Neural Network has been applied to many Natural Language Processing tasks. Instead of building hand-craft features, DNN builds features by automatic learning, fitting different domains well. In this paper, we propose a novel convolution network, incorporating lexical features, applied to Relation Extraction. Since many current deep neural networks use word embedding by word table, which, however, neglects semantic meaning among words, we import a new coding method, which coding input words by synonym dictionary to integrate semantic knowledge into the neural network. We compared our Convolution Neural Network CNN on relation extraction with the state-of-art tree kernel approach, including Typed Dependency Path Kernel and Shortest Dependency Path Kernel and Context-Sensitive tree kernel, resulting in a 9% improvement competitive performance on ACE2005 data set. Also, we compared the synonym coding with the one-hot coding, and our approach got 1.6% improvement. Moreover, we also tried other coding method, such as hypernym coding, and give some discussion according the result.

[1]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[2]  Gerhard Paass,et al.  Semantic relation extraction with kernels over typed dependency trees , 2010, KDD.

[3]  Dmitry Zelenko,et al.  Kernel methods for relation extraction , 2003 .

[4]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[5]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[6]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[7]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[8]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[9]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[10]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[11]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[12]  Holger Schwenk,et al.  Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[13]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[14]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.