Deep Web: Databases on the Web

IntroductIon Finding information on the Web using a web search engine is one of the primary activities of today's web users. For a majority of users results returned by conventional search engines are an essentially complete set of links to all pages on the Web relevant to their queries. However, current-day searchers do not crawl and index a significant portion of the Web and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms are not indexed by search engines and cannot be found in searchers' results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages which embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. Content provided by many web databases is often of very high quality and can be extremely valuable to many users. For example, the PubMed database (http://www.pubmed.gov) allows a user to search through millions of high-quality peer-reviewed papers on biomedical research, while the AutoTrader car classifieds database at http://autotrader.com is highly useful for anyone wishing to buy or sell a car. In general, since each Deep Web searchable database is a collection of data in a specific domain it can often provide more specific and detailed information that is not available or hard to find in the indexable Web. The following section provides background information on the non-indexable Web and web databases.

[1]  Krithi Ramamritham,et al.  Where Do Time Constraints Come From? Where Do They Go? , 1996 .

[2]  Soon-Young Huh,et al.  Relaxing Queries with Hierarchical Quantified Data Abstraction , 2008, J. Database Manag..

[3]  Mitesh Patel,et al.  Structured databases on the web: observations and implications , 2004, SGMD.

[4]  Michael K. Bergman White Paper: The Deep Web: Surfacing Hidden Value , 2001 .

[5]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[6]  Advanced Topics in Database Research, Vol. 1 , 2002 .

[7]  Martin Bergman,et al.  The deep web:surfacing the hidden value , 2000 .

[8]  Apostolos V. Zarras,et al.  Accelerating Web Service Workflow Execution via Intelligent Allocation of Services to Servers , 2010, J. Database Manag..

[9]  Maria-Esther Vidal,et al.  On the Efficiency of Querying and Storing RDF Documents , 2011, Graph Data Management.

[10]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[11]  Ling Liu,et al.  QA-Pagelet: data preparation techniques for large-scale data analysis of the deep Web , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jayant Madhavan,et al.  Structured Data Meets the Web: A Few Observations , 2006, IEEE Data Eng. Bull..

[13]  Heeseok Lee,et al.  Managing Organizational Hypermedia Documents: A Meta-information System , 2002, Advanced Topics in Database Research, Vol. 1.

[14]  Deepak Kulkarni,et al.  Integrated Functional and Executional Modeling of Software Using Web-Based Databases , 1998, J. Database Manag..

[15]  Laura Díaz,et al.  Spatial Data Integration Over the Web , 2009 .

[16]  Sourav S. Bhowmick,et al.  DEQUE: querying the deep web , 2005, Data Knowl. Eng..

[17]  Sergio Greco,et al.  Managing Inconsistent Databases Using Active Integrity Constraints , 2005, Encyclopedia of Database Technologies and Applications.

[18]  Philip Calvert,et al.  Encyclopedia of Database Technologies and Applications , 2005 .

[19]  Sherif Sakr,et al.  Graph Data Management: Techniques and Applications , 2011, Graph Data Management.

[20]  Amel Mammar,et al.  UB2SQL: A Tool for Building Database Applications Using UML and B Formal Method , 2006, J. Database Manag..

[21]  Jorge Horacio Doorn,et al.  Handbook of Research on Innovations in Database Technologies and Applications: Current and Future Trends , 2009 .