Dependency Pattern Models for Information Extraction

Several techniques for the automatic acquisition of Information Extraction (IE) systems have used dependency trees to form the basis of an extraction pattern representation. These approaches have used a variety of pattern models (schemes for representing IE patterns based on particular parts of the dependency analysis). An appropriate pattern model should be expressive enough to represent the information which is to be extracted from text without being overly complex. Previous investigations into the appropriateness of the currently proposed models have been limited. This paper compares a variety of pattern models, including ones which have been previously reported and variations of them. Each model is evaluated using existing data consisting of IE scenarios from two very different domains (newswire stories and biomedical journal articles). The models are analysed in terms of their ability to represent relevant information, number of patterns generated and performance on an IE scenario. It was found that the best performance was observed from two models which use the majority of relevant portions of the dependency tree without including irrelevant sections.

[1]  Geoffrey Sampson English for the computer , 1995 .

[2]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[3]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[4]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[5]  Ralph Grishman,et al.  Unsupervised Discovery of Scenario-Level Patterns for Information Extraction , 2000, ANLP.

[6]  Guodong Zhou,et al.  Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information , 2007, EMNLP.

[7]  Ralph Grishman,et al.  An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition , 2003, ACL.

[8]  Geoffrey K. Pullum,et al.  Natural languages and context-free languages , 1982 .

[9]  Chris Fox,et al.  Achieving Expressive Completeness and Computational Efficiency for Underspecified Semantic Representations , 2005 .

[10]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[11]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[12]  Taku Kudo,et al.  Boosting-based Parse Reranking with Subtree Features , 2005, ACL.

[13]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.

[14]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[15]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[16]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[17]  Mark Stevenson,et al.  Comparing Information Extraction Pattern Models , 2006 .

[18]  James I. Garrels,et al.  The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data , 1999, Nucleic Acids Res..

[19]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[20]  Mirella Lapata,et al.  A comparison of parsing technologies for the biomedical domain , 2005, Natural Language Engineering.

[21]  Hwee Tou Ng,et al.  A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[22]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[23]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[24]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[25]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[26]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[27]  Ido Dagan,et al.  Scaling Web-based Acquisition of Entailment Relations , 2004, EMNLP.

[28]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.

[29]  Ido Dagan,et al.  Investigating a Generic Paraphrase-Based Approach for Relation Extraction , 2006, EACL.

[30]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[31]  Mark Stevenson,et al.  Improving Semi-supervised Acquisition of Relation Extraction Patterns , 2006 .

[32]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[33]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[34]  Mark Stevenson,et al.  Automatically acquiring a linguistically motivated genic interaction extraction system , 2005, ICML 2005.

[35]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[36]  Ralph Grishman,et al.  Automatic Pattern Acquisition for Japanese Information Extraction , 2001, HLT.

[37]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[38]  Mark Stevenson,et al.  The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.

[39]  Roman Yangarber,et al.  Counter-Training in Discovery of Semantic Patterns , 2003, ACL.

[40]  Jian Su,et al.  A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features , 2006, ACL.

[41]  Leslie Ann Goldberg,et al.  COUNTING UNLABELLED SUBTREES OF A TREE IS #P-COMPLETE , 2000 .

[42]  Yang Jin,et al.  Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE , 2005, ACL.