Flexible Interoperability in a Federated Digital Library of Theses and Dissertations

Federated digital libraries are composed of autonomous, possibly heterogeneous information services distributed across the Internet. Federation provides users with a seamless, integrated view of the collected information. We are creating a federated system for the Networked Digital Library of Theses and Dissertations (NDLTD), an international consortium of universities, libraries, and other supporting institutions focused on electronic theses and dissertations (ETDs). The NDLTD allows its members minimal restrictions and maximal autonomy, so federating requires dealing flexibly with differences among ontologies, data formats, and finding aids involving several thousand ETDs in four formats and two languages. Our solution involves adapting MARIAN, an object-oriented digital library system, to serve as mediation middleware for the federated NDLTD. Components of the solution include: 1) the use of several harvesting techniques; 2) an architecture based on object-oriented ontologies of searchers and metadata; 3) diversity within the harvested data joined to a single collection view for the user; and 4) an integrated framework for addressing such issues as data quality, information compression, and flexible search. The system can handle very large dynamic collections. It can add new sites and adapt to changes in existing sites. MARIAN’s modular architecture and powerful and flexible data model work together to build an effective integrated solution within a simple uniform framework.

[1]  Edward A. Fox,et al.  Digital libraries , 1995, CACM.

[2]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[3]  Thomas Severiens,et al.  Electronic Information Management in Physics , 1995 .

[4]  Edward A. Fox,et al.  Multilingual Federated Searching Across Heterogeneous Collections , 1998, D Lib Mag..

[5]  Vijayalakshmi Atluri,et al.  SI in digital libraries , 2000, CACM.

[6]  Clifford A. Lynch,et al.  The Z39.50 Information Retrieval Standard: Part I: A Strategic View of Its Past, Present and Future , 1997, D-Lib Magazine.

[7]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[8]  Edward A. Fox,et al.  A digital library for authors: recent progress of the networked digital library of theses and dissertations , 1999, DL '99.

[9]  Luis Gravano,et al.  The Stanford Digital Library metadata architecture , 1997, International Journal on Digital Libraries.

[10]  Edward A. Fox,et al.  Development of a modern OPAC: from REVTOLC to MARIAN , 1993, SIGIR.

[11]  Herbert Van de Sompel,et al.  The Santa Fe Convention of the Open Archives Initiative , 2000, D Lib Mag..

[12]  Kevin Chen-Chuan Chang,et al.  Interoperability for digital libraries worldwide , 1998, CACM.

[13]  Eberhard R. Hilf,et al.  Integrated Information Management for Physics , 1994, CODATA.

[14]  Elke A. Rundensteiner,et al.  Maintaining data warehouses over changing information sources , 2000, CACM.

[15]  Kurt Maly,et al.  The UPS Prototype project: exploring the obstacles in creating a crosse-print archive end-user service , 2000 .

[16]  James C. French,et al.  Growth and server availability of the NCSTRL digital library , 2000, DL '00.

[17]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[18]  Sandra Payette,et al.  Making global digital libraries work: collection services, connectivity regions, and collection views , 1998, DL '98.

[19]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[20]  Remo Pareschi,et al.  Agent-Based Document Retrieval for the European Physicists: A Project Overview , 1997 .

[21]  Edward A. Fox,et al.  Use and usability in a digital library search system , 1999, ArXiv.

[22]  Thomas Severiens,et al.  PhysDoc: A Distributed Network of Physics Institutions Collecting, Indexing, and Searching High Quality Documents by using Harvest , 2000, D Lib Mag..

[23]  Andreas Paepcke,et al.  A mediation infrastructure for digital library services , 2000, DL '00.