Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries

Digital libraries (DLs) are complex information systems and therefore demand formal foundations lest development efforts diverge and interoperability suffers. In this article, we propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and Societies (5S), which allow us to define digital libraries rigorously and usefully. Streams are sequences of arbitrary items used to describe both static and dynamic (e.g., video) content. Structures can be viewed as labeled directed graphs, which impose organization. Spaces are sets with operations on those sets that obey certain constraints. Scenarios consist of sequences of events or actions that modify states of a computation in order to accomplish a functional requirement. Societies are sets of entities and activities and the relationships among them. Together these abstractions provide a formal foundation to define, relate, and unify concepts---among others, of digital objects, metadata, collections, and services---required to formalize and elucidate "digital libraries". The applicability, versatility, and unifying power of the 5S model are demonstrated through its use in three distinct applications: building and interpretation of a DL taxonomy, informal and formal analysis of case studies of digital libraries (NDLTD and OAI), and utilization as a formal basis for a DL description language.

[1]  Edward A. Fox,et al.  Flexible Interoperability in a Federated Digital Library of Theses and Dissertations , 2001 .

[2]  Edward A. Fox,et al.  Networked Digital Library of Theses and Dissertations: An International Effort Unlocking University Resources , 1997, D Lib Mag..

[3]  MacKenzie Smith,et al.  The DSpace institutional digital repository system: current functionality , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[4]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[5]  Ricardo A. Baeza-Yates,et al.  Proximal nodes: a model to query document databases by content and structure , 1997, TOIS.

[6]  N. Fuhr An Extension of XQL for Information Retrieval , 2000 .

[7]  Mayer D. Schwartz,et al.  The Dexter Hypertext Reference Model , 1994, CACM.

[8]  E.A. Fox,et al.  ETANA-DL: managing complex information applications - an archaeology digital library , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[9]  John M. Carroll,et al.  Scenario-based design: envisioning work and technology in system development: john wiley & sons , 1995 .

[10]  Michael F. Schwartz,et al.  Discovering shared interests using graph analysis , 1993, CACM.

[11]  Naomi Dushay Using Structural Metadata to Localize Experience of Digital Content , 2001, ArXiv.

[12]  Edward A. Fox,et al.  Networked Digital Library of Theses and Dissertations (「ディジタル図書館」ワークショップ第15回(奈良先端科学技術大学院大学.1999年7月19日)) , 1999 .

[13]  Sandra Payette,et al.  The Fedora Project: An Open-source Digital Object Repository Management System , 2003, D Lib Mag..

[14]  Herbert Van de Sompel,et al.  The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.

[15]  Paul B. Kantor,et al.  Studying the Value of Library and Information Services. Part I: Establishing a Theoretical Framework. , 1997 .

[16]  Grady Booch UML in action , 1999, CACM.

[17]  Stavros Christodoulakis,et al.  Multimedia Information Systems: Issues and Approaches , 1995, Modern Database Systems.

[18]  Edward A. Fox,et al.  Research Contributions , 2014 .

[19]  Gerard Salton,et al.  The SMART and SIRE experimental retrieval systems , 1997 .

[20]  David Ellis,et al.  The Physical and Cognitive Paradigms in Information Retrieval Research , 1992, J. Documentation.

[21]  Maria Cristina Ferreira de Oliveira,et al.  A statechart-based model for hypermedia applications , 2001, TOIS.

[22]  Raymond A. Lorie,et al.  Long term preservation of digital information , 2001, JCDL '01.

[23]  Elaine J. Weyuker,et al.  Computability, complexity, and languages - fundamentals of theoretical computer science , 2014, Computer science and applied mathematics.

[24]  Edward A. Fox,et al.  Development of the coder system: A testbed for artificial intelligence methods in information retrieval , 1987, Inf. Process. Manag..

[25]  Ricardo A. Baeza-Yates,et al.  XQL and proximal nodes , 2002, J. Assoc. Inf. Sci. Technol..

[26]  Christine L. Borgman,et al.  What are Digital Libraries? Competing Visions , 1999, Inf. Process. Manag..

[27]  Edward A. Fox,et al.  ETANA-DL: managing complex information applications - an archaeology digital library , 2004, JCDL.

[28]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[29]  Naren Ramakrishnan PIPE: Web Personalization by Partial Evaluation , 2000, IEEE Internet Comput..

[30]  Marc Abrams,et al.  UIML: An Appliance-Independent XML User Interface Language , 1999, Comput. Networks.

[31]  Glynn Winskel,et al.  The formal semantics of programming languages - an introduction , 1993, Foundation of computing series.

[32]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[33]  Alberto H. F. Laender,et al.  DEByE - Data Extraction By Example , 2002, Data Knowl. Eng..

[34]  Pasquale Pagano,et al.  Foundations of a Multidimensional Query Language for Digital Libraries , 2002, ECDL.

[35]  Edward A. Fox,et al.  A digital library for authors: recent progress of the networked digital library of theses and dissertations , 1999, DL '99.

[36]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[37]  B. Cesnik,et al.  Digital Libraries , 2001, Yearbook of Medical Informatics.

[38]  E. F. Codd,et al.  A Relational Model for Large Shared Data Banks , 1970 .

[39]  Hock-Liew Eng,et al.  Networked digital library of theses and dissertations , 2005 .

[40]  Dario Lucarella,et al.  A visual retrieval environment for hypermedia information systems , 1996, TOIS.

[41]  Edward A. Fox,et al.  A Web art gallery , 1998, DL '98.

[42]  Mary Beth Rosson,et al.  Integrating development of task and object models , 1999, CACM.

[43]  Robert D. Cameron,et al.  A Universal Citation Database as a Catalyst for Reform in Scholarly Communication , 1997, First Monday.

[44]  H. Varian,et al.  Internet Publishing and beyond: The Economics of Digital Information and Intellectual Property , 2000 .

[45]  Jun Wang,et al.  Java MARIAN: From an OPAC to a Modern Digital Library System , 2002, SPIRE.

[46]  Axel van Lamsweerde,et al.  Inferring Declarative Requirements Specifications from Operational Scenarios , 1998, IEEE Trans. Software Eng..

[47]  Craig A. Knoblock,et al.  A hierarchical approach to wrapper induction , 1999, AGENTS '99.

[48]  Hector Garcia-Molina,et al.  The SIFT information dissemination system , 1999, TODS.

[49]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[50]  Herbert Van de Sompel,et al.  Open Archives Initiative - Protocol for Metadata Harvesting - v.2.0 , 2002 .

[51]  Elaine J. Weyuker,et al.  14 – Abstract Complexity , 1994 .

[52]  Paul B. Kantor,et al.  Studying the Value of Library and Information Services. Part II. Methodology and Taxonomy , 1997, J. Am. Soc. Inf. Sci..

[53]  Ricardo A. Baeza-Yates,et al.  Compression: A Key for Next-Generation Text Retrieval Systems , 2000, Computer.

[54]  Carol Peters,et al.  Multilingual information discovery and access (MIDAS) , 1999, DL '99.

[55]  Yannis Papakonstantinou,et al.  Query rewriting for semistructured data , 1999, SIGMOD '99.

[56]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[57]  Nicholas J. Belkin,et al.  Digital library: gross structure and requirements: report from a march 1994 workshop , 1994 .

[58]  Elisa Bertino,et al.  A formal model of views for object-oriented database systems , 1997 .

[59]  Harold Borko,et al.  Encyclopedia of library and information science , 1970 .

[60]  B. C. Vickery,et al.  Faceted classification schemes , 1966 .

[61]  Gary Marchionini,et al.  Toward a worldwide digital library , 1998, CACM.

[62]  Kevin Chen-Chuan Chang,et al.  NBDL: a CIS framework for NSDL , 2001, JCDL '01.

[63]  Steven J. DeRose,et al.  Markup systems and the future of scholarly text processing , 1987, CACM.

[64]  Melissa Dawe,et al.  Collaborative design with use case scenarios , 2001, JCDL '01.

[65]  Wendy E. Mackay,et al.  DIVA: exploratory data analysis with multimedia streams , 1998, CHI.

[66]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[67]  Norbert Fuhr,et al.  XIRQL: An XML query language based on information retrieval concepts , 2004, TOIS.

[68]  Charles L. A. Clarke,et al.  An Algebra for Structured Text Search and a Framework for its Implementation , 1995, Comput. J..

[69]  Jianwen Su,et al.  The design of an interactive online help desk in the Alexandria Digital Library , 1999 .

[70]  K. Bailey Typologies and taxonomies: An introduction to classification techniques. , 1994 .

[71]  Edward A. Fox,et al.  Envision: a user-centered database of computer science literature , 1995, CACM.

[72]  Ryuichi Ogawa,et al.  Scenario-Based Hypermedia: A Model and a System , 1992, European Conference on Hypertext.

[73]  Sandra Payette,et al.  Making global digital libraries work: collection services, connectivity regions, and collection views , 1998, DL '98.

[74]  F. Donelson Smith,et al.  The architecture and implementation of a distributed hypermedia storage system , 1993, HYPERTEXT '93.

[75]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[76]  Edward A. Fox,et al.  Modeling and Building Personalized Digital Libraries with PIPE and 5SL , 2001, DELOS.

[77]  Catherine C. Marshall,et al.  Going digital: a look at assumptions underlying digital libraries , 1995, CACM.

[78]  Morten Kyng,et al.  Creating contexts for design , 1995 .

[79]  Mary Beth Rosson,et al.  Object-oriented design from user scenarios , 1996, CHI Conference Companion.

[80]  Carl Lagoze,et al.  The Warwick Framework: A Container Architecture for Diverse Sets of Metadata , 1996, D Lib Mag..

[81]  Kurt Maly,et al.  SODA: Smart Objects, Dumb Archives , 1999, ECDL.

[82]  Mukesh Singhal,et al.  Advanced concepts in operating systems : distributed, database, and multiprocessor operating systems , 1993 .

[83]  Peter Ingwersen,et al.  The development of a method for the evaluation of interactive information retrieval systems , 1997, J. Documentation.

[84]  Michael E. Lesk,et al.  The SMART automatic document retrieval systems—an illustration , 1965, CACM.

[85]  Carol Peters,et al.  Multilingual Information Discovery and AccesS (MIDAS): A Joint ACM DL'99 / ACM SIGIR'99 Workshop , 1999, D Lib Mag..

[86]  Mary Beth Rosson,et al.  Personalization by Partial Evaluation. , 2001 .

[87]  Edward A. Fox,et al.  5SL: a language for declarative specification and generation of digital libraries , 2002, JCDL '02.

[88]  Yiyu Yao,et al.  On modeling information retrieval with probabilistic inference , 1995, TOIS.

[89]  Lee L. Zia,et al.  The NSF national science, mathematics, engineering, and technology education digital library program , 2001, CACM.

[90]  Edward A. Fox,et al.  A study of user behavior in an immersive virtual environment for digital libraries , 2000, DL '00.

[91]  Bing Wang,et al.  A hybrid system approach for supporting digital libraries , 1999, International Journal on Digital Libraries.

[92]  John J. Leggett,et al.  Patron-augmented digital libraries , 2000, DL '00.

[93]  Nikolay A. Skvortsov,et al.  Infrastructure of the subject mediating environment aiming at semantic interoperability of heterogeneous digital library collections , 2000 .

[94]  Ross Wilkinson,et al.  Integration of Information Retrieval and Hypertext Via Structure , 1996 .

[95]  Jamie McKenzie,et al.  Libraries of the Future , 1996 .

[96]  Julian Warner,et al.  Internet Publishing and beyond: The Economics of Digital Information and Intellectual Property , 2002, J. Documentation.

[97]  Shailey Minocha,et al.  Supporting Scenario-Based Requirements Engineering , 1998, IEEE Trans. Software Eng..

[98]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[99]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[100]  A. Sutcliffe,et al.  A technique combination approach to requirements engineering , 1997, Proceedings of ISRE '97: 3rd IEEE International Symposium on Requirements Engineering.

[101]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[102]  Serge Abiteboul,et al.  Inferring structure in semistructured data , 1997, SGMD.

[103]  Edward A. Fox,et al.  Scenario-Based Generation of Digital Library Services , 2003, ECDL.

[104]  Ian H. Witten,et al.  Greenstone: Open-source DL software , 2001, CACM.

[105]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[106]  Andreas Oberweis,et al.  Information system behavior specification by high level Petri nets , 1996, TOIS.

[107]  R. Tennant Algebra , 1941, Nature.

[108]  David Chenho Kung,et al.  Formal approach to scenario analysis , 1994, IEEE Software.

[109]  Dan Brickley,et al.  Resource description framework (RDF) schema specification , 1998 .

[110]  Ian H. Witten,et al.  Greenstone: a comprehensive open-source digital library software system , 2000, DL '00.

[111]  Herbert Van de Sompel,et al.  The Santa Fe Convention of the Open Archives Initiative , 2000, D Lib Mag..

[112]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[113]  Mukesh Singhal,et al.  Advanced Concepts In Operating Systems , 1994 .

[114]  Claudia Bauzer Medeiros,et al.  A framework for designing and implementing the user interface of a geographic digital library , 1999, International Journal on Digital Libraries.

[115]  Rebecca Green,et al.  Typologies and taxonomies: An introduction to classification techniques , 1996 .

[116]  David M. Nichols,et al.  Talking in the library: implications for the design of digital libraries , 1997, DL '97.

[117]  Vincent Quint,et al.  Interactively Editing Structured Documents , 1989, Electron. Publ..

[118]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[119]  Kurt Maly,et al.  Buckets: smart objects for digital libraries , 2001, CACM.

[120]  John J. Leggett,et al.  Viewing Dexter with open eyes , 1994, CACM.

[121]  Edward A. Fox,et al.  MARIAN: Flexible Interoperability for Federated Digital Libraries , 2001, ECDL.

[122]  Dick C. A. Bulterman,et al.  The Amsterdam hypermedia model: adding time and context to the Dexter model , 1994, CACM.

[123]  S. Robertson The probability ranking principle in IR , 1997 .

[124]  Catriel Beeri,et al.  A Formal Approach to Object-Oriented Databases , 1990, Data Knowl. Eng..

[125]  Edward A. Fox,et al.  Visual Semantic Modeling of Digital Libraries , 2003, ECDL.

[126]  Edward A. Fox,et al.  The Open Archives Initiative , 2001 .

[127]  J. Michael Spivey,et al.  Understanding Z : A specification language and its formal semantics , 1985, Cambridge tracts in theoretical computer science.

[128]  Edward A. Fox,et al.  Development of a modern OPAC: from REVTOLC to MARIAN , 1993, SIGIR.

[129]  Jean Tague-Sutcliffe,et al.  Complete formal model for information retrieval systems , 1991, SIGIR '91.

[130]  Gio Wiederhold,et al.  Digital libraries, value, and productivity , 1995, CACM.

[131]  Kenneth M. Anderson,et al.  Metis: lightweight, flexible, and Web-based workflow services for digital libraries , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[132]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[133]  Linda Cantara METS: The Metadata Encoding and Transmission Standard , 2005 .

[134]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[135]  J. Widom,et al.  Interactive Query and Search in Semistructured Databases , 1998, WebDB.

[136]  Juliano Lopes de Oliveira,et al.  An Environment for Modeling and Design of Geographic Applications , 1997, GeoInformatica.

[137]  Roy Rada,et al.  Structured hypertext with domain semantics , 1998, TOIS.

[138]  A FoxEdward,et al.  Streams, structures, spaces, scenarios, societies (5s) , 2004 .

[139]  Shiyali Ramamrita Ranganathan A descriptive account of the Colon classification , 1990 .

[140]  Edward A. Fox,et al.  ETD-ms: An Interoperability Metadata Standard for Electronic Theses and Dissertations , 2004 .

[141]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[142]  Thomas Ball,et al.  Mawl: A Domain-Specific Language for Form-Based Services , 1999, IEEE Trans. Software Eng..

[143]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[144]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[145]  Charles F. Goldfarb,et al.  The XML Handbook , 1998 .

[146]  Alice M. Agogino,et al.  Design principles for the information architecture of a SMET education digital library , 2001, JCDL '01.

[147]  Edward A. Fox,et al.  MARIAN Searching and Querying across Heterogeneous Federated Digital Libraries , 2000, DELOS.

[148]  Ching-chih Chen,et al.  Digital Libraries: Universal Access to Human Knowledge , 2001 .

[149]  Henry M. Gladney,et al.  Authorization management for digital libraries , 2001, CACM.

[150]  Thomas R. Kochtanek,et al.  Delphi Study of Digital Libraries , 1999, Inf. Process. Manag..