Authority-based keyword search in databases

Our system applies authority-based ranking to keyword search in databases modeled as labeled graphs. Three ranking factors are used: the relevance to the query, the specificity and the importance of the result. All factors are handled using authority-flow techniques that exploit the link-structure of the data graph, in contrast to traditional Information Retrieval. We address the performance challenges in computing the authority flows in databases by using precomputation and exploiting the database schema if present. We conducted user surveys and performance experiments on multiple real and synthetic datasets, to assess the semantic meaningfulness and performance of our system.

[1]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[2]  Ronald Fagin,et al.  Static index pruning for information retrieval systems , 2001, SIGIR '01.

[3]  Torsten Suel,et al.  I/O-efficient techniques for computing pagerank , 2002, CIKM '02.

[4]  Akiko Aizawa The feature quantity: an information theoretic perspective of Tfidf-like measures , 2000, SIGIR '00.

[5]  PapakonstantinouYannis,et al.  Authority-based keyword search in databases , 2008 .

[6]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[7]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[8]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[9]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[10]  W. Bruce Croft Advances in Information Retrieval , 2000, The Information Retrieval Series.

[11]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[12]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[13]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[14]  Jun Yang,et al.  TupleRank and Implicit Relationship Discovery in Relational Databases , 2003, WAIM.

[15]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[16]  Gene H. Golub,et al.  Matrix computations , 1983 .

[17]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[19]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[20]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[21]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[22]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[23]  Klara Nahrstedt,et al.  An XML-based Quality of Service Enabling Language for the Web , 2002, J. Vis. Lang. Comput..

[24]  Heikki Mannila,et al.  Relational link-based ranking , 2004, VLDB.

[25]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[27]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[28]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[29]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[30]  Shaul Dar,et al.  DTL's DataSpot: Database Exploration Using Plain Language , 1998, VLDB.

[31]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[32]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[33]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[34]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[35]  Linh Thai Nguyen Static Index Pruning for Information Retrieval Systems: A Posting-Based Approach , 2009, LSDS-IR@SIGIR.

[36]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[37]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[38]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[39]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[40]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[41]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[42]  Vagelis Hristidis,et al.  ObjectRank: a system for authority-based search on databases , 2006, SIGMOD Conference.

[43]  Jacques Savoy,et al.  Bayesian Inference Networks and Spreading Activation in Hypertext Systems , 1992, Inf. Process. Manag..

[44]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[45]  Golan Yona,et al.  Hubs of knowledge: using the functional link structure in Biozon to mine for biologically significant entities , 2006, BMC Bioinformatics.

[46]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[47]  Maria-Esther Vidal,et al.  Ranking target objects of navigational queries , 2006, WIDM '06.