User-driven relational models for entity-relation search and extraction

The ability to extract new knowledge from large datasets is one of the most significant challenges facing society. The problem spans across domains from intelligence analysis and scientific research to basic web search. Current information extraction and retrieval tools either lack the flexibility to adapt to evolving information needs or require users to sift through search results and piece together relevant information. With so much data compounded by the criticality of finding relevant information, new tools and methods are needed to discover and relate relevant pieces of information in ever expanding repositories of data. We posit that user-driven relational models are needed to collectively learn and discover fine-grained entities and relations that are relevant to a user's information need. To meet this need, we present a ranked retrieval and extraction framework for collectively learning and integrating evidence of entities and relational dependencies to predict at query time, a ranking of sentences containing the most relevant entities and relational dependencies. By using a relational model, evidence can be leveraged across entity and relation instances. By performing joint inference at query time, NLP pipeline errors are minimized, and more adaptive and discriminative models that meet the specific knowledge discovery needs of the user can be developed. Our goal is to develop user-driven relational models of entities and their relational dependencies, and a search system based on these models that allow users to search for known entities and relations, discover new relations from known entities, and discover new entities from known relations. Preliminary qualitative and quantitative evaluations demonstrate the efficacy and potential of the proposed relational modeling approach.

[1]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[2]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[3]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[4]  Danushka Bollegala,et al.  Relational duality: unsupervised extraction of semantic relations between entities on the web , 2010, WWW '10.

[5]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[6]  Lenhart K. Schubert Can we derive general world knowledge from texts , 2002 .

[7]  Joel Alleyne Everything is Miscellaneous - or, Everything is Metadata , 2008 .

[8]  Satoshi Sekine,et al.  Automatic Paraphrase Discovery based on Context and Keywords between NE Pairs , 2005, IJCNLP.

[9]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[10]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[11]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[12]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[13]  Estevam R. Hruschka,et al.  Populating the Semantic Web by Macro-reading Internet Text , 2009, SEMWEB.

[14]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[15]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[16]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[17]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[19]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[21]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[22]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[23]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[24]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[25]  Ophir Frieder,et al.  Passage relevance models for genomics search , 2008, DTMBIO '08.