Customizing an Information Extraction System to a New Domain

We introduce several ideas that improve the performance of supervised information extraction systems with a pipeline architecture, when they are customized for new domains. We show that: (a) a combination of a sequence tagger with a rule-based approach for entity mention extraction yields better performance for both entity and relation mention extraction; (b) improving the identification of syntactic heads of entity mentions helps relation extraction; and (c) a deterministic inference engine captures some of the joint domain structure, even when introduced as a postprocessing step to a pipeline system. All in all, our contributions yield a 20% relative increase in F1 score in a domain significantly different from the domains used during the development of our information extraction system.

[1]  Julian N. Marewski,et al.  Proceedings of the 31st Annual Meeting of the Cognitive Science Society , 2009 .

[2]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[3]  Michael Tomasello,et al.  Two-year-old children's production of multiword utterances: A usage-based analysis , 2009 .

[4]  Preslav Nakov,et al.  Proceedings of the ACL 2011 Workshop on Relational Models of Semantics, RELMS@ACL 2011, Portland, Oregon, USA, June 23, 2011 , 2011, RELMS@ACL.

[5]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[6]  Kenji Sagae,et al.  Dynamic Programming for Linear-Time Incremental Parsing , 2010, ACL.

[7]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[8]  Richard Johansson,et al.  Dependency-based Semantic Role Labeling of PropBank , 2008, EMNLP.

[9]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[10]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[11]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.

[12]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[13]  Ivan Titov,et al.  A Latent Variable Model of Synchronous Parsing for Syntactic and Semantic Dependencies , 2008, CoNLL.

[14]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  Norman I. Badler,et al.  A Parameterized Action Representation for Virtual Human Agents , 1998 .

[17]  Preslav Nakov,et al.  SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals , 2009, SEW@NAACL-HLT.

[18]  Jane Wilhelms,et al.  Put: language-based interactive manipulation of objects , 1996, IEEE Computer Graphics and Applications.

[19]  Josef Ruppenhofer,et al.  FrameNet II: Extended theory and practice , 2006 .

[20]  Minhua Ma,et al.  Virtual human animation in natural language visualisation , 2007, Artificial Intelligence Review.

[21]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.

[22]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[23]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[24]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[25]  Mihai Surdeanu,et al.  Robust Information Extraction with Perceptrons , 2007 .

[26]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[27]  Sabine Bergler,et al.  Postnominal Prepositional Phrase Attachment in Proteomics , 2006, BioNLP@NAACL-HLT.

[28]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[29]  Martha Palmer,et al.  Investigating Regular Sense Extensions Based on Intersective Levin Classes , 1998, COLING-ACL.

[30]  H. Kamp A Theory of Truth and Semantic Representation , 2008 .

[31]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[32]  Katrin Erk,et al.  SemEval-2007 Task 19: Frame Semantic Structure Extraction , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[33]  Philip Resnik,et al.  GIBBS SAMPLING FOR THE UNINITIATED , 2010 .

[34]  Rodolfo Delmonte,et al.  Understanding Implicit Entities and Events with Getaruns , 2009, 2009 IEEE International Conference on Semantic Computing.

[35]  Hoifung Poon,et al.  Joint Inference for Knowledge Extraction from Biomedical Literature , 2010, NAACL.

[36]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[37]  Hai Zhao,et al.  Semantic Dependency Parsing of NomBank and PropBank: An Efficient Integrated Approach via a Large-scale Feature Selection , 2009, EMNLP.

[38]  Fausto Giunchiglia,et al.  NALIG: A CAD system for interior design with high level interaction capabilities , 1993, Proceedings of 1993 IEEE Conference on Tools with Al (TAI-93).

[39]  Richard Sproat Inferring the environment in a text-to-scene conversion system , 2001, K-CAP '01.

[40]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[41]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[42]  Andrew McCallum,et al.  Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[43]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[44]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[45]  Martha Palmer,et al.  Retrieving Correct Semantic Boundaries in Dependency Structure , 2010, Linguistic Annotation Workshop.

[46]  Arjan Egges,et al.  Generating A 3D Simulation Of A Car Accident From A Written Description In Natural Language: The CarSim System , 2001, ACL 2001.

[47]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[48]  James Pustejovsky,et al.  Semantic Coercion in Language: Beyond Distributional Analysis , 2012 .

[49]  Daniel G. Bobrow,et al.  The Encoding of lexical implications in VerbNet Predicates of change of locations , 2008, LREC.

[50]  Nianwen Xue,et al.  Calibrating Features for Semantic Role Labeling , 2004, EMNLP.

[51]  William W. Cohen,et al.  NER Systems that Suit User’s Preferences: Adjusting the Recall-Precision Trade-off for Entity Extraction , 2006, NAACL.

[52]  C. Fillmore,et al.  Grammatical constructions and linguistic generalizations: The What's X doing Y? construction , 1999 .

[53]  Roser Morante,et al.  Linking Events and Their Participants in Discourse , 2010 .

[54]  Timothy Baldwin,et al.  MELB-YB: Preposition Sense Disambiguation Using Rich Semantic Features , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[55]  Lei Shi,et al.  Putting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing , 2005, CICLing.

[56]  Xavier Carreras,et al.  Semantic Role Labeling: An Introduction to the Special Issue , 2008, Computational Linguistics.

[57]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[58]  Catherine Havasi,et al.  ConceptNet 3 : a Flexible , Multilingual Semantic Network for Common Sense Knowledge , 2007 .

[59]  Joyce Yue Chai,et al.  Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates , 2010, ACL.

[60]  Dan Klein,et al.  Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach , 2002, ICML.

[61]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[62]  Kenneth M. Kahn,et al.  Creation of computer animation from story descriptions , 1979 .

[63]  Jeffrey Mark Siskind Grounding language in perception , 2004, Artificial Intelligence Review.

[64]  Ivan Titov,et al.  Online graph planarisation for synchronous parsing of semantic and syntactic dependencies , 2009, IJCAI 2009.

[65]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[66]  Jerome A. Feldman,et al.  Extending Embodied Lexical Development , 1998 .

[67]  Sanda M. Harabagiu,et al.  UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources , 2010, *SEMEVAL.

[68]  Mirella Lapata,et al.  Detecting Novel Compounds: The Role of Distributional Evidence , 2003, EACL.

[69]  Neville Ryant,et al.  A large-scale classification of English verbs , 2008, Lang. Resour. Evaluation.

[70]  Christopher D. Manning,et al.  A Global Joint Model for Semantic Role Labeling , 2008, CL.

[71]  Noah A. Smith,et al.  SEMAFOR: Frame Argument Resolution with Log-Linear Models , 2010, SemEval@ACL.

[72]  Julia Hirschberg,et al.  Frame Semantics in Text-to-Scene Generation , 2010, KES.

[73]  Imed Zitouni,et al.  Improving Mention Detection Robustness to Noisy Input , 2010, EMNLP.

[74]  A. Goldberg Constructions at Work: The Nature of Generalization in Language , 2006 .

[75]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[76]  Miriam R. L. Petruck FRAME SEMANTICS , 1996 .

[77]  Martha Palmer,et al.  Towards a Domain Independent Semantics: Enhancing Semantic Representation with Construction Grammar , 2010, HLT-NAACL 2010.

[78]  Ding Liu,et al.  Semantic Role Features for Machine Translation , 2010, COLING.

[79]  Vasile Rus,et al.  Logic Form Transformation of WordNet and its Applicability to Question Answering , 2001, ACL.

[80]  Suzanne Stevenson,et al.  Unsupervised Semantic Role Labellin , 2004, EMNLP.

[81]  Christian Scheible,et al.  An Evaluation of Predicate Argument Clustering using Pseudo-Disambiguation , 2010, LREC.

[82]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[83]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[84]  Richard W. Boberg,et al.  Generating line drawings from abstract scene descriptions. , 1973 .

[85]  Katerina Pastra PRAXICON : The Development of a Grounding Resource , 2008 .

[86]  Martha Palmer,et al.  Getting the Most out of Transition-based Dependency Parsing , 2011, ACL.

[87]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[88]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[89]  Wayne H. Ward,et al.  Towards Robust Semantic Role Labeling , 2007, CL.

[90]  L. Getoor,et al.  1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[91]  Richard Sproat,et al.  WordsEye: an automatic text-to-scene conversion system , 2001, SIGGRAPH.

[92]  Wiebke Wagner Verb Sense Disambiguation using a Predicate-Argument-Clustering Model , 2009 .

[93]  Martha Palmer,et al.  Leveraging Lexical Resources for the Detection of Event Relations , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[94]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[95]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[96]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[97]  D. Roth 1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[98]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[99]  Charles J. Fillmore,et al.  Pragmatically Controlled Zero Anaphora , 1986 .

[100]  Iryna Gurevych,et al.  TUD: Semantic Relatedness for Relation Classification , 2010, SemEval@ACL.

[101]  Robert F. Simmons The clowns microworld , 1975, TINLAP '75.

[102]  Michael J. Witbrock,et al.  Searching for Common Sense: Populating Cyc™ from the Web , 2005, AAAI.

[103]  Sara Tonelli,et al.  VENSES++: Adapting a deep semantic processing system to the identification of null instantiations , 2010, SemEval@ACL.

[104]  Kiyoung Choi,et al.  Hardware-software codesign of resource-constrained real-time systems , 1996, Proceedings of 3rd International Workshop on Real-Time Computing Systems and Applications.

[105]  Josef Ruppenhofer,et al.  Frames predict the interpretation of lexical omissions , 2009 .

[106]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[107]  Jian Su,et al.  ECNU: Effective Semantic Relations Classification without Complicated Features or Multiple External Corpora , 2010, SemEval@ACL.

[108]  Thomas Hofmann,et al.  Statistical Models for Co-occurrence Data , 1998 .

[109]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.