The web-DL environment for building digital libraries from the web

The Web contains a huge volume of unstructured data, which is difficult to manage. In digital libraries, on the other hand, information is explicitly organized, described, and managed. Community-oriented services are built to attend specific information needs and tasks. In this paper, we describe an environment, Web-DL, that allows the construction of digital libraries from the Web. The Web-DL environment will allow us to collect data from the Web, standardize it, and publish it through a digital library system. It provides support to services and organizational structure normally available in digital libraries, but benefiting from the breadth of the Web contents. We experimented with applying the Web-DL environment to the Networked Digital Library of Theses and Dissertations (NDLTD), thus demonstrating that the rapid construction of DLs from the Web is possible. Also, Web-DL provides an alternative as a largescale solution for interoperability between independent digital libraries.

[1]  Edward A. Fox,et al.  The networked digital library of theses and dissertations: Changes in the university community , 2002, J. Comput. High. Educ..

[2]  Jun Wang,et al.  Java MARIAN: From an OPAC to a Modern Digital Library System , 2002, SPIRE.

[3]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[4]  Alberto H. F. Laender,et al.  DEByE - Data Extraction By Example , 2002, Data Knowl. Eng..

[5]  Proceedings 2003 Joint Conference on Digital Libraries , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[6]  Donna Bergmark,et al.  Collection synthesis , 2002, JCDL '02.

[7]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[8]  Berthier A. Ribeiro-Neto,et al.  Bootstrapping for example-based data extraction , 2001, CIKM '01.

[9]  Herbert Van de Sompel,et al.  The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.

[10]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[11]  David W. Embley,et al.  Representing and Querying Semistructured Web Data Using Nested Tables with Structural Variants , 2002, ER.

[12]  Edward A. Fox,et al.  Web-DL: an experience in building digital libraries from the web , 2002, CIKM '02.

[13]  Norbert Fuhr Networked information retrieval , 1996, SIGIR '96.

[14]  Sandra Payette,et al.  Making global digital libraries work: collection services, connectivity regions, and collection views , 1998, DL '98.

[15]  Luis Gravano,et al.  Probe, count, and classify: categorizing hidden web databases , 2001, SIGMOD '01.

[16]  Ian H. Witten,et al.  Greenstone: a comprehensive open-source digital library software system , 2000, DL '00.

[17]  Berthier A. Ribeiro-Neto,et al.  An Example-Based Environment for Wrapper Generation , 2000, ER.

[18]  Berthier A. Ribeiro-Neto,et al.  A Framework for Generating Attribute Extractors for Web Data Sources , 2002, SPIRE.

[19]  Edward A. Fox,et al.  ETD-ms: An Interoperability Metadata Standard for Electronic Theses and Dissertations , 2004 .

[20]  Kurt Maly,et al.  Kepler - An OAI Data/Service Provider for the Individual , 2001, D Lib Mag..