Breaking Through the Syntax Barrier: Searching with Entities and Relations

The next wave in search technology will be driven by the identification, extraction, and exploitation of real-world entities represented in unstructured textual sources. Search systems will either let users express information needs naturally and analyze them more intelligently, or allow simple enhancements that add more user control on the search process. The data model will exploit graph structure where available, but not impose structure by fiat. First generation Web search, which uses graph information at the macroscopic level of inter-page hyperlinks, will be enhanced to use fine-grained graph models involving page regions, tables, sentences, phrases, and real-world-entities. New algorithms will combine probabilistic evidence from diverse features to produce responses that are not URLs or pages, but entities and their relationships, or explanations of how multiple entities are related.

[1]  SchwartzRichard,et al.  An Algorithm that Learns Whats in a Name , 1999 .

[2]  William W. Cohen,et al.  Learning to Match and Cluster Entity Names , 2001 .

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[7]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[8]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI.

[9]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[10]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Jimmy J. Lin,et al.  Web question answering: is more always better? , 2002, SIGIR '02.

[12]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[13]  Luis Gravano,et al.  Learning search engine specific query transformations for question answering , 2001, WWW '01.

[14]  Raymond J. Mooney,et al.  Learning Semantic Parsers: An Important but Under-Studied Problem , 2004 .

[15]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[16]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[17]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[18]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[19]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[20]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[21]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[22]  Pushpak Bhattacharyya,et al.  Is question answering an acquired skill? , 2004, WWW '04.

[23]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[25]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.