A practical method for browsing a relational database using a standard search engine

Standard search engines have made the task of looking for information relatively easy and painless. In sharp contrast, most relational database interfaces make searching for information complicated and confusing - mainly because they require knowledge of specialized languages and the schema of the underlying data. In this paper, the authors describe a technique that supports the querying of a relational database (RDB) using a standard search engine. The technique involves expressing database queries through URLs. The technique also includes the development of a special wrapper that can process the URL-query and generate web pages that contain the answer to the query as well as links to additional data. By following these specialized links, a standard web crawler can index the RDB along with all the URL-queries. Once the content and their corresponding URL-queries have been indexed, a user may submit keyword queries through a standard search engine and receive the most current information in the database. Moreover, the system has been recently augmented with standard query syntax and metadata that allows users to formulate more expressive queries and search engines to index the database more efficiently. The authors describe the technique for making database content accessible to the web; they provide an evaluation of a prototype system that shows the correctness of our approach; and they present experimental results that show how adding metadata can improve the overall efficiency of the search.

[1]  Silvana Castano,et al.  Database Security , 1997, IFIP Advances in Information and Communication Technology.

[2]  Reda Alhajj,et al.  Simplified access to structured databases by adapting keyword search and database selection , 2004, SAC '04.

[3]  Junghoo Cho,et al.  Impact of search engines on page popularity , 2004, WWW '04.

[4]  Maria Soledad Pera,et al.  Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles , 2008, Integr. Comput. Aided Eng..

[5]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[6]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[7]  Kathleen M. Swigger,et al.  A Flexible Architecture for Integrating Analysis Operations Into a Scientific Data Repository , 2006, 2006 IEEE International Conference on Information Reuse & Integration.

[8]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[9]  Adir Even,et al.  The metadata enigma , 2006, CACM.

[10]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[11]  Ben Hammersley,et al.  Developing Feeds With RSS And Atom , 2005 .

[12]  Torsten Suel,et al.  Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Mengjie Zhang,et al.  Detecting data records in semi-structured web sites based on text token clustering , 2008, Integr. Comput. Aided Eng..

[14]  Roy Goldman,et al.  WSQ/DSQ: a practical approach for combined querying of databases and the Web , 2000, SIGMOD 2000.

[15]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[17]  Ana Maria de Carvalho Moura,et al.  A survey on metadata for describing and retrieving Internet resources , 1998, World Wide Web.

[18]  Dirk Lewandowski,et al.  The freshness of web search engine databases , 2006, J. Inf. Sci..

[19]  Rakesh Agrawal,et al.  Extending relational database systems to automatically enforce privacy policies , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[21]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[22]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[23]  Philippe Besnard,et al.  Ontology-based inference for causal explanation , 2008, Integr. Comput. Aided Eng..

[24]  Christoph Mangold,et al.  Improving intranet search-engines using context information from databases , 2005, CIKM '05.

[25]  Jenny Edwards,et al.  An adaptive model for optimizing performance of an incremental web crawler , 2001, WWW '01.

[26]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[27]  Berthier A. Ribeiro-Neto,et al.  Searching web databases by structuring keyword-based queries , 2002, CIKM '02.

[28]  Richi Nayak,et al.  A knowledge retrieval model using ontology mining and user profiling , 2008, Integr. Comput. Aided Eng..

[29]  Yehoshua Sagiv,et al.  Interconnection semantics for keyword search in XML , 2005, CIKM '05.

[30]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[31]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[32]  Petros Zerfos,et al.  Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[33]  Philip S. Yu,et al.  Optimal crawling strategies for web search engines , 2002, WWW '02.