NAGA: Searching and Ranking Knowledge

The Web has the potential to become the world's largest knowledge base. In order to unleash this potential, the wealth of information available on the Web needs to be extracted and organized. There is a need for new querying techniques that are simple and yet more expressive than those provided by standard keyword-based search engines. Searching for knowledge rather than Web pages needs to consider inherent semantic structures like entities (person, organization, etc.) and relationships (isA, located In, etc.). In this paper, we propose NAGA, a new semantic search engine. NAGA builds on a knowledge base, which is organized as a graph with typed edges, and consists of millions of entities and relationships extracted from Web-based corpora. A graph-based query language enables the formulation of queries with additional semantic information. We introduce a novel scoring model, based on the principles of generative language models, which formalizes several notions such as confidence, informativeness and compactness and uses them to rank query results. We demonstrate NAGA's superior result quality over state-of-the-art search engines and question answering systems.

[1]  C. Zheng,et al.  ; 0 ; , 1951 .

[2]  J. Davenport Editor , 1960 .

[3]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[4]  C. Fellbaum An Electronic Lexical Database , 1998 .

[5]  Douglas E. Appelt,et al.  Introduction to Information Extraction , 1999, AI Commun..

[6]  Djoerd Hiemstra,et al.  Relating the new language models of information retrieval to the traditional retrieval models , 2000 .

[7]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Djoerd Hiemstra,et al.  A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[10]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[12]  ChengXiang Zhai,et al.  Risk minimization and language modeling in text retrieval dissertation abstract , 2002, SIGF.

[13]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  Irit Katriel,et al.  On algorithms for online topological ordering and sorting , 2004 .

[15]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[16]  Soumen Chakrabarti,et al.  Breaking Through the Syntax Barrier: Searching with Entities and Relations , 2004, ECML.

[17]  Kevin Chen-Chuan Chang,et al.  Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web , 2005, CIDR.

[18]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[19]  Sihem Amer-Yahia,et al.  Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[20]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[21]  W. Bruce Croft,et al.  Statistical language modeling for information retrieval , 2006, Annu. Rev. Inf. Sci. Technol..

[22]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[23]  Alon Y. Halevy,et al.  A Platform for Personal Information Management and Integration , 2005, CIDR.

[24]  Yehoshua Sagiv,et al.  Interconnection semantics for keyword search in XML , 2005, CIKM '05.

[25]  Sihem Amer-Yahia,et al.  XML Full-Text Search: Challenges and Opportunities , 2005, VLDB.

[26]  Wolfgang Nejdl,et al.  Semantically Enhanced Searching and Ranking on the Desktop , 2005, Semantic Desktop Workshop.

[27]  Ian Horrocks,et al.  Description Logics as Ontology Languages for the Semantic Web , 2005, Mechanizing Mathematical Reasoning.

[28]  Gerhard Weikum,et al.  The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents , 2005, VLDB.

[29]  Eugene Agichtein Scaling Information Extraction to Large Document Collections , 2005, IEEE Data Eng. Bull..

[30]  Gary C. Borchardt,et al.  External Knowledge Sources for Question Answering , 2005, TREC.

[31]  Eser Kandogan,et al.  Avatar semantic search: a database approach to information retrieval , 2006, SIGMOD Conference.

[32]  Tim Furche,et al.  RDF Querying: Language Constructs and Evaluation Methods Compared , 2006, Reasoning Web.

[33]  Gerhard Weikum,et al.  Combining linguistic and statistical analysis to extract relations from web documents , 2006, KDD '06.

[34]  Diego Reforgiato Recupero,et al.  Annotated RDF , 2006, ESWC.

[35]  Jens Dittrich,et al.  iDM: a unified and versatile data model for personal dataspace management , 2006, VLDB.

[36]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[37]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[38]  Raghu Ramakrishnan,et al.  Managing information extraction: state of the art and research directions , 2006, SIGMOD Conference.

[39]  Ravi Kumar,et al.  Visualizing tags over time , 2006, WWW '06.

[40]  Hans-Peter Seidel,et al.  Gesture modeling and animation by imitation , 2006 .

[41]  Sriram Raghavan,et al.  Avatar Information Extraction System , 2006, IEEE Data Eng. Bull..

[42]  Raghu Ramakrishnan,et al.  Community Information Management , 2006, IEEE Data Eng. Bull..

[43]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[44]  Sunita Sarawagi,et al.  Scalable Information Extraction and Integration. , 2006 .

[45]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[46]  Anastasia Ailamaki,et al.  Challenges inbuilding a DBMS Resource Advisor , 2006, IEEE Data Eng. Bull..

[47]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[48]  Amit P. Sheth,et al.  SPARQ2L: towards support for subgraph extraction queries in rdf databases , 2007, WWW '07.

[49]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[50]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[51]  Luis Gravano,et al.  Efficient Keyword Search Across Heterogeneous Relational Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[52]  Wei-Ying Ma,et al.  Web object retrieval , 2007, WWW '07.

[53]  Kevin Chen-Chuan Chang,et al.  Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web , 2007, CIDR.

[54]  Oren Etzioni,et al.  Structured Querying of Web Text Data: A Technical Challenge , 2007, CIDR.

[55]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[56]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.