Proximity Search in Databases

An information retrieval (IR) engine can rank documents based on textual proximity of keywords within each document. In this paper we apply this notion to search across an entire database for objects that are "near" other relevant objects. Proximity search enables simple "focusing" queries based on general relationships among objects, helpful for interactive query sessions. We view the database as a graph, with data in vertices (objects) and relationships indicated by edges. Proximity is defined based on shortest paths between objects. We have implemented a prototype search engine that uses this model to enable keyword searches over databases, and we have found it very effective for quickly finding relevant information. Computing the distance between objects in a graph stored on disk can be very expensive. Hence, we show how to build compact indexes that allow us to quickly find the distance between objects at search time. Experiments show that our algorithms are effcient and scale well.

[1]  Shaul Dar,et al.  DTL's DataSpot: Database Exploration Using Plain Language , 1998, VLDB.

[2]  I. Good A CAUSAL CALCULUS (I)* , 1961, The British Journal for the Philosophy of Science.

[3]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[4]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[5]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[6]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[7]  Philip N. Klein,et al.  Faster Shortest-Path Algorithms for Planar Graphs , 1997, J. Comput. Syst. Sci..

[8]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[9]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[10]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[11]  Robert E. Tarjan,et al.  Applications of a planar separator theorem , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[12]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[13]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[14]  Vijay Kumar,et al.  Improved algorithms and data structures for solving graph problems in external memory , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[15]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..

[16]  Nelson Mendonça Mattos,et al.  Integrating SQL Databases with Content-Specific Search Engines , 1997, VLDB.

[17]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[18]  RamakrishnanRaghu,et al.  A performance study of transitive closure algorithms , 1994 .

[19]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[20]  Christos D. Zaroliagis,et al.  Shortest Path Queries in Digraphs of Small Treewidth , 1995, ICALP.

[21]  Philip N. Klein,et al.  Cutting down on Fill Using Nested Dissection: Provably Good Elimination Orderings , 1993 .

[22]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[23]  Gerald Salton,et al.  Automatic text processing , 1988 .

[24]  Alistair Moffat,et al.  Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files , 1993, VLDB.

[25]  Christos D. Zaroliagis,et al.  Shortest Paths in Digraphs of Small Treewidth. Part I: Sequential Algorithms , 2000, Algorithmica.

[26]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[27]  Philip N. Klein,et al.  Faster Shortest-Path Algorithms for Planar Graphs , 1997, J. Comput. Syst. Sci..

[28]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[29]  Raghu Ramakrishnan,et al.  A performance study of transitive closure algorithms , 1994, SIGMOD '94.