In-domain Relation Discovery with Meta-constraints via Posterior Regularization

We present a novel approach to discovering relations and their instantiations from a collection of documents in a single domain. Our approach learns relation types by exploiting meta-constraints that characterize the general qualities of a good relation in any domain. These constraints state that instances of a single relation should exhibit regularities at multiple levels of linguistic structure, including lexicography, syntax, and document-level context. We capture these regularities via the structure of our probabilistic model as well as a set of declaratively-specified constraints enforced during posterior inference. Across two domains our approach successfully recovers hidden relation structure, comparable to or outperforming previous state-of-the-art approaches. Furthermore, we find that a small set of constraints is applicable across the domains, and that using domain-specific constraints can further improve performance.

[1]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[2]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[3]  Ralph Grishman,et al.  An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition , 2003, ACL.

[4]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[5]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.

[6]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[7]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[8]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[9]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[10]  Dong-Hong Ji,et al.  Automatic Relation Extraction with Model Order Selection and Discriminative Label Identification , 2005, IJCNLP.

[11]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[12]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[13]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[14]  Thomas Hofmann,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2007 .

[15]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[16]  Ronen Feldman,et al.  Clustering for unsupervised relation identification , 2007, CIKM '07.

[17]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[18]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[19]  Andrew McCallum,et al.  Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment , 2009, EMNLP.

[20]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[21]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[22]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[23]  Oren Etzioni,et al.  Unsupervised Methods for Determining Object and Relation Synonyms on the Web , 2014, J. Artif. Intell. Res..

[24]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[25]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[26]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[27]  Jian Su,et al.  Discovering Relations Between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering , 2005, IJCNLP.

[28]  Andrew McCallum,et al.  Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[29]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[30]  David R. Karger,et al.  Content Modeling Using Latent Permutations , 2009, J. Artif. Intell. Res..

[31]  Ji Donghong,et al.  Automatic relation extraction with model order selection and discriminative label identification , 2005 .

[32]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[33]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[34]  Ben Taskar,et al.  Expectation Maximization and Posterior Constraints , 2007, NIPS.

[35]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[36]  Padhraic Smyth,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2006, NIPS.