论文信息 - Information gathering in the World-Wide Web: the W3QL query language and the W3QS system

Information gathering in the World-Wide Web: the W3QL query language and the W3QS system

The World Wide Web (WWW) is a fast growing global information resource. It contains an enormous amount of information and provides access to a variety of services. Since there is no central control and very few standards of information organization or service offering, searching for information and services is a widely recognized problem. To some degree this problem is solved by “search services,” also known as “indexers,” such as Lycos, AltaVista, Yahoo, and others. These sites employ search engines known as “robots” or “knowbots” that scan the network periodically and form text-based indices. These services are limited in certain important aspects. First, the structural information, namely, the organization of the document into parts pointing to each other, is usually lost. Second, one is limited by the kind of textual analysis provided by the “search service.” Third, search services are incapable of navigating “through” forms. Finally, one cannot prescribe a complex database-like search. We view the WWW as a huge database. We have designed a high-level SQL-like language called W3QL to support effective and flexible query processing, which addresses the structure and content of WWW nodes and their varied sorts of data. We have implemented a system called W3QS to execute W3QL queries. In W3QS, query results are declaratively specified and continuously maintained as views when desired. The current architecture of W3QS provides a server that enables users to pose queries as well as integrate their own data analysis tools. The system and its query language set a framework for the development of database-like tools over the WWW. A significant contribution of this article is in formalizing the WWW and query processing over it.

David Konopnicki | Oded Shmueli | O. Shmueli | D. Konopnicki

[1] George A. Mihaila. WebSQL - An SQL-like Query Language for the World Wide Web , 1996 .

[2] Serge Abiteboul,et al. Querying and Updating the File , 1993, VLDB.

[3] Vannevar Bush,et al. As we may think , 1945, INTR.

[4] Laks V. S. Lakshmanan,et al. A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[5] Catriel Beeri,et al. A Logical Query Language for Hypertext Systems , 1992, ECHT.

[6] Roy T. Fielding,et al. Hypertext Transfer Protocol - HTTP/1.0 , 1996, RFC.

[7] Dr P M E De Bra. Searching for Arbitrary Information in the WWW : the Fish − Search for Mosaic , 1994 .

[8] Frank G. Halasz,et al. Reflections on NoteCards: seven issues for the next generation of hypermedia systems , 1987, Hypertext.

[9] David Konopnicki,et al. W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[10] G. Halasz Frank,et al. Reflections on NoteCards: seven issues for the next generation of hypermedia systems , 1987, CACM.

[11] B. Pinkerton,et al. Finding What People Want : Experiences with the WebCrawler , 1994, WWW Spring 1994.

[12] Alberto O. Mendelzon,et al. Expressing structural hypertext queries in graphlog , 1989, Hypertext.

[13] Oliver A. McBryan,et al. GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.