Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) Approach

In 1991, a group of researchers chose the term digital libraries to describe an emerging field of research, development, and practice. Since then, Virginia Tech has had funded research in this area, largely through its Digital Library Research Laboratory. This book is the first in a four book series that reports our key findings and current research investigations. Underlying this book series are six completed dissertations (Gonalves, Kozievitch, Leidig, Murthy, Shen, Torres), eight dissertations underway, and many masters theses. These reflect our experience with a long string of prototype or production systems developed in the lab, such as CITIDEL, CODER, CTRnet, Ensemble, ETANA, ETD-db, MARIAN, and Open Digital Libraries. There are hundreds of related publications, presentations, tutorials, and reports. We have built upon that work so this book, and the others in the series, will address digital library related needs in many computer science, information science, and library science (e.g., LIS) courses, as well as the requirements of researchers, developers, and practitioners. Much of the early work in the digital library field struck a balance between addressing real-world needs, integrating methods from related areas, and advancing an ever-expanding research agenda. Our work has fit in with these trends, but simultaneously has been driven by a desire to provide a firm conceptual and formal basis for the field. Our aim has been to move from engineering to science. We claim that our 5S (Societies, Scenarios, Spaces, Structures, Streams) framework, discussed in publications dating back to at least 1998, provides a suitable basis. This book introduces 5S, and the key theoretical and formal aspects of the 5S framework. While the 5S framework may be used to describe many types of information systems, and is likely to have even broader utility and appeal, we focus here on digital libraries. Our view of digital libraries is broad, so further generalization should be straightforward. We have connected with related fields, including hypertext/hypermedia, information storage and retrieval, knowledge management, machine learning, multimedia, personal information management, and Web 2.0. Applications have included managing not only publications, but also archaeological information, educational resources, fish images, scientific datasets, and scientific experiments/simulations. Table of Contents: Introduction / Exploration / Mathematical Preliminaries / Minimal Digital Library / Archaeological Digital Libraries / 5S Results: Lemmas, Proofs, and 5SSuite / Glossary / Bibliography / Authors' Biographies / Index

[1]  Ian H. Witten,et al.  StoneD: A Bridge between Greenstone and DSpace , 2005, D Lib Mag..

[2]  S. Choudhury,et al.  A semi-automated digital preservation system based on semantic Web services , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[3]  C. Borgman From Gutenberg to the Global Information Infrastructure , 2000 .

[4]  Francesco M. Donini,et al.  Spatial layout representation for query-by-sketch content-based image retrieval , 2002, Pattern Recognit. Lett..

[5]  Edward A. Fox,et al.  MARIAN Searching and Querying across Heterogeneous Federated Digital Libraries , 2000, DELOS.

[6]  Charles L. A. Clarke,et al.  An Algebra for Structured Text Search and a Framework for its Implementation , 1995, Comput. J..

[7]  Christine L. Borgman,et al.  Social aspects of digital libraries (working session) , 1996, DL '96.

[8]  Edward A. Fox,et al.  Digital Libraries Initiative (DLI) Projects 1994‐1999 , 2005 .

[9]  Henry M. Gladney,et al.  Authorization management for digital libraries , 2001, CACM.

[10]  Edward A. Fox,et al.  Digital library education in computer science programs , 2007, JCDL '07.

[11]  Jianwen Su,et al.  The design of an interactive online help desk in the Alexandria Digital Library , 1999 .

[12]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[13]  Edward A. Fox,et al.  Designing Protocols in Support of Digital Library Componentization , 2002, ECDL.

[14]  Dean B. Krafft,et al.  Ncore: architecture and implementation of a flexible, collaborative digital library , 2008, JCDL '08.

[15]  John A. N. Lee,et al.  PANEL on: Using CITIDEL as a Portal for IT Education , 2002 .

[16]  K. Bailey Typologies and taxonomies: An introduction to classification techniques. , 1994 .

[17]  Ryuichi Ogawa,et al.  Scenario-Based Hypermedia: A Model and a System , 1992, European Conference on Hypertext.

[18]  Udi Manber,et al.  WebGlimpse: combining browsing and searching , 1997 .

[19]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[20]  Norbert Fuhr,et al.  Daffodil: An Integrated Desktop for Supporting High-Level Search Activities in Federated Digital Libraries , 2002, ECDL.

[21]  David M. Levy,et al.  Heroic measures: reflections on the possibility and purpose of digital preservation , 1998, DL '98.

[22]  Edward A. Fox,et al.  Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries , 2004, TOIS.

[23]  Roy Rada,et al.  Structured hypertext with domain semantics , 1998, TOIS.

[24]  MacKenzie Smith,et al.  The DSpace institutional digital repository system: current functionality , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[25]  E.A. Fox,et al.  ETANA-DL: managing complex information applications - an archaeology digital library , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[26]  C. Borgman Scholarship in the Digital Age: Information, Infrastructure, and the Internet , 2007 .

[27]  Clifford A. Lynch,et al.  The Z39.50 Information Retrieval Standard: Part I: A Strategic View of Its Past, Present and Future , 1997, D-Lib Magazine.

[28]  Seamus Ross,et al.  Preservation research and sustainable digital libraries , 2005, International Journal on Digital Libraries.

[29]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[30]  Pasquale Pagano,et al.  OpenDLib: A Digital Library Service System , 2002, ECDL.

[31]  Liz Lyon,et al.  Interoperability Across Digital Library Programmes? We Must Have QA! , 2004 .

[32]  Helen R. Tibbo Archival perspectives on the emerging digital library , 2001, CACM.

[33]  Sandra Payette,et al.  The Fedora Project: An Open-source Digital Object Repository Management System , 2003, D Lib Mag..

[34]  Mukesh Singhal,et al.  Advanced concepts in operating systems : distributed, database, and multiprocessor operating systems , 1993 .

[35]  Ian H. Witten,et al.  A retrospective look at Greenstone: lessons from the first decade , 2007, JCDL '07.

[36]  Kurt Maly,et al.  Buckets: smart objects for digital libraries , 2001, CACM.

[37]  Stevan Harnad,et al.  The self-archiving initiative , 2001, Nature.

[38]  Edward A. Fox,et al.  MARIAN: Flexible Interoperability for Federated Digital Libraries , 2001, ECDL.

[39]  Ricardo A. Baeza-Yates,et al.  Proximal nodes: a model to query document databases by content and structure , 1997, TOIS.

[40]  Elaine J. Weyuker,et al.  4 – A Universal Program , 1983 .

[41]  Lothar Schmitz,et al.  Preservation of digital publications: an OAIS extension and implementation , 2003, DocEng '03.

[42]  Manuel A. Pérez-Quiñones,et al.  Enhancing usability in CITIDEL: multimodal, multilingual, and interactive visualization interfaces , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[43]  Renée J. Miller,et al.  DataWeb: Customizable Database Publishing for the Web , 1997, IEEE Multim..

[44]  Christopher Olston,et al.  ScentTrails: Integrating browsing and searching on the Web , 2003, TCHI.

[45]  Stuart Weibel,et al.  The State of the Dublin Core Metadata Initiative April 1999 , 1999, D Lib Mag..

[46]  Nicholas J. Belkin,et al.  Braque: Design of an Interface to Support User Interaction in Information Retrieval , 1993, Inf. Process. Manag..

[47]  Michael Lesk Perspectives on DLI‐2 – Growing the Field , 2005 .

[48]  Mortimer Taube Associates Storage and retrieval of information by means of the association of ideas , 1955 .

[49]  Edward A. Fox,et al.  A study of user behavior in an immersive virtual environment for digital libraries , 2000, DL '00.

[50]  George W. Furnas,et al.  Considerations for information environments and the NaviQue workspace , 1998, DL '98.

[51]  Robert Wilensky,et al.  A framework for distributed digital object services , 2006, International Journal on Digital Libraries.

[52]  Louis M. Gomez,et al.  Formative design evaluation of superbook , 1989, TOIS.

[53]  Stephen H. Edwards,et al.  Ensemble PDP-8: eight principles for distributed portals , 2010, JCDL '10.

[54]  F. Donelson Smith,et al.  The architecture and implementation of a distributed hypermedia storage system , 1993, HYPERTEXT '93.

[55]  Edward A. Fox,et al.  Multimedia curricula, courses, and knowledge modules , 1995, CSUR.

[56]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[57]  Mary Beth Rosson,et al.  Object-oriented design from user scenarios , 1996, CHI Conference Companion.

[58]  Jane Greenberg,et al.  A digital library for education: the PEN-DOR project , 1999, Electron. Libr..

[59]  A. Sutcliffe,et al.  A technique combination approach to requirements engineering , 1997, Proceedings of ISRE '97: 3rd IEEE International Symposium on Requirements Engineering.

[60]  Edward A. Fox,et al.  A Framework for Building Open Digital Libraries , 2001, D Lib Mag..

[61]  Edward A. Fox,et al.  Research Contributions , 2014 .

[62]  Edward A. Fox,et al.  Digital libraries , 1995, CACM.

[63]  Chad Carson,et al.  Optimizing queries over multimedia repositories , 1996, SIGMOD '96.

[64]  Sandra Payette,et al.  Flexible and Extensible Digital Object and Repository Architecture (FEDORA) , 1998, ECDL.

[65]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[66]  Sanjiva Weerawarana,et al.  Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI , 2002, IEEE Internet Computing.

[67]  Chris North,et al.  Citiviz: A Visual User Interface to the CITIDEL System , 2004, ECDL.

[68]  Catriel Beeri,et al.  A Formal Approach to Object-Oriented Databases , 1990, Data Knowl. Eng..

[69]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[70]  Naren Ramakrishnan PIPE: Web Personalization by Partial Evaluation , 2000, IEEE Internet Comput..

[71]  James Gleick,et al.  Chaos, Making a New Science , 1987 .

[72]  Donald J. Waters Transforming Libraries Through Digital Preservation , 1998 .

[73]  Alberto Del Bimbo,et al.  Spatial arrangement of color in retrieval by visual similarity , 2002, Pattern Recognit..

[74]  Thornton Staples,et al.  Virginia Dons FEDORA: A Prototype for a Digital Object Repository , 2000, D Lib Mag..

[75]  Grady Booch UML in action , 1999, CACM.

[76]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[77]  Edward A. Fox,et al.  Visual Semantic Modeling of Digital Libraries , 2003, ECDL.

[78]  Heiko Schuldt,et al.  DelosDLMS: From the DELOS vision to the implementation of a future digital library management system , 2008 .

[79]  Gene Golovchinsky,et al.  Queries? Links? Is there a difference? , 1997, CHI.

[80]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part II. Results of a Design Study , 1982, J. Documentation.

[81]  David M. Nichols,et al.  Talking in the library: implications for the design of digital libraries , 1997, DL '97.

[82]  Raffaela Mirandola,et al.  A New Approach to Performance Modelling of Client/Server Distributed Data Base Architectures , 1997, Perform. Evaluation.

[83]  Ross Wilkinson,et al.  Integration of Information Retrieval and Hypertext Via Structure , 1996 .

[84]  Edward A. Fox,et al.  Requirements Gathering and Modeling of Domain-Specific Digital Libraries with the 5S Framework: An Archaeological Case Study with ETANA , 2005, ECDL.

[85]  Edward A. Fox,et al.  "What is a good digital library?" - A quality model for digital libraries , 2007, Inf. Process. Manag..

[86]  Edward A. Fox,et al.  Incremental Clustering for Very Large Document Databases: Initial MARIAN Experience , 1995, Inf. Sci..

[87]  Pasquale Pagano,et al.  A Flexible Repository Service: The Opendlib Solution , 2002, ELPUB.

[88]  Ricardo A. Baeza-Yates,et al.  Compression: A Key for Next-Generation Text Retrieval Systems , 2000, Computer.

[89]  Edward A. Fox,et al.  Toward a Global Digital Library: Generalizing US-Korea Collaboration on Digital Libraries , 2002, D-Lib Magazine.

[90]  Edward A. Fox,et al.  The Core: Digital Library Education in Library and Information Science Programs , 2006, D Lib Mag..

[91]  S. Robertson The probability ranking principle in IR , 1997 .

[92]  James C. French,et al.  Ensuring Retrieval Effectiveness in Distributed Digital Libraries , 1996, J. Vis. Commun. Image Represent..

[93]  Edward A. Fox,et al.  Towards a digital library theory: a formal digital library ontology , 2008, International Journal on Digital Libraries.

[94]  Ondřej Klobušník,et al.  ArXiv.org e-print archive , 2004 .

[95]  James C. French,et al.  Evaluating database selection techniques: a testbed and experiment , 1998, SIGIR '98.

[96]  Jane Hunter,et al.  Implementing Preservation Strategies for Complex Multimedia Objects , 2003, ECDL.

[97]  Herbert Van de Sompel,et al.  The Santa Fe Convention of the Open Archives Initiative , 2000, D Lib Mag..

[98]  Edward A. Fox,et al.  Curriculum development for digital libraries , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[99]  David Chenho Kung,et al.  Formal approach to scenario analysis , 1994, IEEE Software.

[100]  Heiko Schuldt,et al.  DelosDLMS - The Integrated DELOS Digital Library Management System , 2007, DELOS.

[101]  Ian H. Witten,et al.  Greenstone: a comprehensive open-source digital library software system , 2000, DL '00.

[102]  Steven J. DeRose,et al.  Markup systems and the future of scholarly text processing , 1987, CACM.

[103]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[104]  Hector Garcia-Molina,et al.  Archival storage for digital libraries , 1998, DL '98.

[105]  Ian H. Witten,et al.  The Greenstone plugin architecture , 2002, JCDL '02.

[106]  Vincent Quint,et al.  Structured documents , 1989 .

[107]  James C. French,et al.  Using query mediators for distributed searching in federated digital libraries , 1999, DL '99.

[108]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[109]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[110]  Jochen Hollmann An Evaluation of Documen tP refetching in a Distributed Digital Library , 2003 .

[111]  Edward J. McCluskey,et al.  Curriculum 68: Recommendations for academic programs in computer science: a report of the ACM curriculum committee on computer science , 1968, CACM.

[112]  Ian H. Witten,et al.  How to Build a Digital Library , 2002 .

[113]  Peter Ingwersen,et al.  The development of a method for the evaluation of interactive information retrieval systems , 1997, J. Documentation.

[114]  Michael E. Lesk,et al.  The SMART automatic document retrieval systems—an illustration , 1965, CACM.

[115]  Carol Peters,et al.  Multilingual Information Discovery and AccesS (MIDAS): A Joint ACM DL'99 / ACM SIGIR'99 Workshop , 1999, D Lib Mag..

[116]  Stuart Weibel,et al.  The Dublin Core Metadata Initiative: Mission, Current Activities, and Future Directions , 2000, D Lib Mag..

[117]  Ronald F. Boisvert,et al.  The architecture of an intelligent virtual mathematical software repository system , 1994 .

[118]  Andreas Oberweis,et al.  Information system behavior specification by high level Petri nets , 1996, TOIS.

[119]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[120]  Shailey Minocha,et al.  Supporting Scenario-Based Requirements Engineering , 1998, IEEE Trans. Software Eng..

[121]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[122]  Serge Abiteboul,et al.  Inferring structure in semistructured data , 1997, SGMD.

[123]  Yasmin B. Kafai,et al.  Social aspects of digital libraries , 1995 .

[124]  Yannis Papakonstantinou,et al.  BBQ: A Visual Interface for Integrated Browsing and Querying of XML , 2000, VDB.

[125]  Ian H. Witten,et al.  Greenstone: Open-source DL software , 2001, CACM.

[126]  J. C. R. Licklider,et al.  Libraries of the future , 1965, BMJ : British Medical Journal.

[127]  Michael F. Schwartz,et al.  Discovering shared interests using graph analysis , 1993, CACM.

[128]  Catherine C. Marshall,et al.  Going digital: a look at assumptions underlying digital libraries , 1995, CACM.

[129]  Eugene Garfield From 1950s documentalists to 20th century information scientists - and beyond : ASIS enters the year 2000 facing remarkable advances and challenges in harnessing the information technology revolution , 2005 .

[130]  Wendy E. Mackay,et al.  DIVA: exploratory data analysis with multimedia streams , 1998, CHI.

[131]  Nancy A. Vanhouse,et al.  Digital Library Use: Social Practice in Design and Evaluation , 2003 .

[132]  Norbert Fuhr,et al.  XIRQL: An XML query language based on information retrieval concepts , 2004, TOIS.

[133]  Gio Wiederhold,et al.  Digital libraries, value, and productivity , 1995, CACM.

[134]  Michael David Williams,et al.  What Makes RABBIT Run? , 1984, Int. J. Man Mach. Stud..

[135]  Rachelle S. Heller,et al.  CRIM: curricular resources in interactive multimedia , 1999, MULTIMEDIA '99.

[136]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[137]  Carl Lagoze,et al.  A Secure Repository Design for Digital Libraries , 1995, D-Lib Magazine.

[138]  Raymond A. Lorie,et al.  A methodology and system for preserving digital data , 2002, JCDL '02.

[139]  Edward A. Fox,et al.  A Web art gallery , 1998, DL '98.

[140]  Mary Beth Rosson,et al.  Integrating development of task and object models , 1999, CACM.

[141]  Jun Wang,et al.  Java MARIAN: From an OPAC to a Modern Digital Library System , 2002, SPIRE.

[142]  Yannis Papakonstantinou,et al.  Mixing querying and navigation in MIX , 2002, Proceedings 18th International Conference on Data Engineering.

[143]  Edward A. Fox,et al.  Prototyping Digital Libraries Handling Heterogeneous Data Sources - The ETANA-DL Case Study , 2004, ECDL.

[144]  Kent Wittenburg,et al.  Integration of browsing, searching, and filtering in an applet for web information access , 1997, CHI Extended Abstracts.

[145]  Christine L. Borgman,et al.  What are Digital Libraries? Competing Visions , 1999, Inf. Process. Manag..

[146]  Laura M. Haas,et al.  PESTO : An Integrated Query/Browser for Object Databases , 1996, VLDB.

[147]  Raymond A. Lorie,et al.  Long term preservation of digital information , 2001, JCDL '01.

[148]  Henning Hopf Knowledge lost in information , 2007 .

[149]  Nicola Ferro,et al.  A formal model of annotations of digital content , 2007, TOIS.

[150]  Luciana Duranti The long-term preservation of accurate and authentic digital data: the INTERPARES project , 2005, Data Sci. J..

[151]  Axel van Lamsweerde,et al.  Inferring Declarative Requirements Specifications from Operational Scenarios , 1998, IEEE Trans. Software Eng..

[152]  Fabrizio Sebastiani,et al.  Guest Editors’ introduction to the focussed issue on the 14th European Conference on Digital Libraries (ECDL 2010) , 2011, International Journal on Digital Libraries.

[153]  Edward A. Fox,et al.  Exploring digital libraries: integrating browsing, searching, and visualization , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[154]  Hans-Jörg Schek,et al.  Digital library information-technology infrastructures , 2005, International Journal on Digital Libraries.

[155]  David Ellis,et al.  The Physical and Cognitive Paradigms in Information Retrieval Research , 1992, J. Documentation.

[156]  Edward A. Fox,et al.  Development of the coder system: A testbed for artificial intelligence methods in information retrieval , 1987, Inf. Process. Manag..

[157]  Ricardo A. Baeza-Yates,et al.  XQL and proximal nodes , 2002, J. Assoc. Inf. Sci. Technol..

[158]  Edward A. Fox,et al.  Integration of complex archeology digital libraries: An ETANA-DL experience , 2008, Inf. Syst..

[159]  Clifford A. Shaffer,et al.  Digital Library 2.0 for Educational Resources , 2011, TPDL.

[160]  Louis M. Gomez,et al.  Behavioral evaluation and analysis of a hypertext browser , 1989, CHI '89.

[161]  Alberto H. F. Laender,et al.  The effectiveness of automatically structured queries in digital libraries , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[162]  Gary Marchionini,et al.  The open video project: research-oriented digital video repository , 2000, DL '00.