Automatic information discovery from the "invisible Web"

A large amount of online information resides on the "invisible Web" - Web pages that are generated dynamically from databases and other data sources hidden from the user. They are not indexed by a static URL but are generated when queries are made via a search interface (a specialized search engine). In this paper, we propose a system that is capable of automatically making use of these specialized engines to find information on the invisible Web. We describe our overall architecture and process: from obtaining the search engines to picking the right engines to query. Experiments show that we can find information that is not found by traditional search engines.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[3]  Mika Klemettinen,et al.  Applying data mining techniques for descriptive phrase extraction in digital document collections , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[4]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[5]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[6]  B. Huberman,et al.  The Deep Web : Surfacing Hidden Value , 2000 .

[7]  Oren Etzioni,et al.  Query routing for Web search engines: architecture and experiments , 2000, Comput. Networks.