Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security

Digital libraries (DLs) have introduced new technologies, as well as leveraging, enhancing, and integrating related technologies, since the early 1990s. These efforts have been enriched through a formal approach, e.g., the 5S (Societies, Scenarios, Spaces, Structures, Streams) framework, which is discussed in two earlier volumes in this series. This volume should help advance work not only in DLs, but also in the WWW and other information systems. Drawing upon four (Kozievitch, Murthy, Park, Yang) completed and three (Elsherbiny, Farag, Srinivasan) in-process dissertations, as well as the efforts of collaborating researchers and scores of related publications, presentations, tutorials, and reports, this book should advance the DL field with regard to at least six key technologies. By integrating surveys of the state-of-the-art, new research, connections with formalization, case studies, and exercises/projects, this book can serve as a computing or information science textbook. It can support studies in cyber-security, document management, hypertext/hypermedia, IR, knowledge management, LIS, multimedia, and machine learning. Chapter 1, with a case study on fingerprint collections, focuses on complex (composite, compound) objects, connecting DL and related work on buckets, DCC, and OAI-ORE. Chapter 2, discussing annotations, as in hypertext/hypermedia, emphasizes parts of documents, including images as well as text, managing superimposed information. The SuperIDR system, and prototype efforts with Flickr, should motivate further development and standardization related to annotation, which would benefit all DL and WWW users. Chapter 3, on ontologies, explains how they help with browsing, query expansion, focused crawling, and classification. This chapter connects DLs with the Semantic Web, and uses CTRnet as an example. Chapter 4, on (hierarchical) classification, leverages LIS theory, as well as machine learning, and is important for DLs as well as the WWW. Chapter 5, on extraction from text, covers document segmentation, as well as how to construct a database from heterogeneous collections of references (from ETDs); i.e., converting strings to canonical forms. Chapter 6 surveys the security approaches used in information systems, and explains how those approaches can apply to digital libraries which are not fully open. Given this rich content, those interested in DLs will be able to find solutions to key problems, using the right technologies and methods. We hope this book will help show how formal approaches can enhance the development of suitable technologies and how they can be better integrated with DLs and other information systems.

[1]  Lois M. L. Delcambre,et al.  Structured Maps: modeling explicit semantics over a universe of information , 1996, International Journal on Digital Libraries.

[2]  D. Kirovski,et al.  Fingerprinting and forensic analysis of multimedia , 2004, MULTIMEDIA '04.

[3]  Francisco García-Sánchez,et al.  Digital libraries and Web 3.0. The CallimachusDL approach , 2011, Comput. Hum. Behav..

[4]  Gerd Stumme,et al.  Ontology Merging for Federated Ontologies on the Semantic Web , 2001, OIS@IJCAI.

[5]  Erik Hetzner A simple method for citation metadata extraction using hidden markov models , 2008, JCDL '08.

[6]  David W. Embley,et al.  Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages , 1999, Data Knowl. Eng..

[7]  Timothy W. Finin,et al.  Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[8]  Lois M. L. Delcambre,et al.  Superimposed Information for the Internet , 1999, WebDB.

[9]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[10]  Min-Yen Kan,et al.  FireCite: Lightweight real-time reference string extraction from webpages , 2009 .

[11]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[12]  George Buchanan,et al.  Representing aggregate works in the digital library , 2006, JCDL '07.

[13]  Herbert Van de Sompel,et al.  IJDL special issue on complex digital objects: Guest editors' introduction , 2005, International Journal on Digital Libraries.

[14]  Richard Gartner METS as an 'Intermediary' Schema for a Digital Library of Complex Scientific Multimedia , 2012 .

[15]  Yunsong Guo,et al.  Comparisons of sequence labeling algorithms and extensions , 2007, ICML '07.

[16]  Ricardo da Silva Torres,et al.  Describing OAI-ORE from the 5S Framework Perspective , 2010, ICADL.

[17]  Rodrygo L. T. Santos,et al.  A Web services-based framework for building componentized digital libraries , 2008, J. Syst. Softw..

[18]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[19]  Andrew McCallum,et al.  Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..

[20]  Edward A. Fox,et al.  5SL: a language for declarative specification and generation of digital libraries , 2002, JCDL '02.

[21]  Jurandy Almeida,et al.  Reusing a compound-based infrastructure for searching video stories , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[22]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[23]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[24]  Robert Wilensky,et al.  Multivalent documents , 2000, CACM.

[25]  Allen Newell,et al.  The Knowledge Level , 1989, Artif. Intell..

[26]  Jeffrey B. Lotspiech,et al.  Security for the digital library-protecting documents rather than channels , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[27]  Elisa Bertino,et al.  An authorization system for digital libraries , 2002, The VLDB Journal.

[28]  Edward A. Fox,et al.  "What is a good digital library?" - A quality model for digital libraries , 2007, Inf. Process. Manag..

[29]  Jim Melton,et al.  SQL multimedia and application packages (SQL/MM) , 2001, SGMD.

[30]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[31]  Dick C. A. Bulterman,et al.  The Amsterdam hypermedia model: adding time and context to the Dexter model , 1994, CACM.

[32]  Edward A. Fox,et al.  Building quality into a digital library , 2000, DL '00.

[33]  Thomas Hofmann,et al.  Predicting structured objects with support vector machines , 2009, Commun. ACM.

[34]  Yiming Yang,et al.  A scalability analysis of classifiers in text categorization , 2003, SIGIR.

[35]  James C. French,et al.  Extensible File Systems (ELFS): An Object-Oriented Approach to High Performance File I/O , 1994, OOPSLA.

[36]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[37]  Michael D. Gordon,et al.  Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[38]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[39]  Qiang Yang,et al.  Deep classification in large-scale text hierarchies , 2008, SIGIR '08.

[40]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[41]  Marianne Winslett,et al.  Authorization in the digital library: secure access to services across enterprise boundaries , 1996, Proceedings of the Third Forum on Research and Technology Advances in Digital Libraries,.

[42]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[43]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[44]  Jane Hunter,et al.  SCOPE: A Scientific Compound Object Publishing and Editing System , 2008, Int. J. Digit. Curation.

[45]  V. A. Gruzman,et al.  Hypermedia Models , 2001 .

[46]  Edward A. Fox,et al.  SIMPEL: a superimposed multimedia presentation editor and player , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[47]  Zhendong Niu,et al.  An Ontology-Based Query System for Digital Libraries , 2008, 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[48]  Edward A. Fox,et al.  International Journal on Digital Libraries manuscript No. (will be inserted by the editor) A Digital Library Framework for Biodiversity Information Systems , 2022 .

[49]  Jiangde Yu,et al.  Metadata Extraction from Chinese Research Papers Based on Conditional Random Fields , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[50]  David Sánchez,et al.  Automatic Generation of Taxonomies from the WWW , 2004, PAKM.

[51]  Omar Chiotti,et al.  Building Ontology in Public Administration: A Case Study , 2006, SEBIZ.

[52]  Lois M. L. Delcambre,et al.  Bundles in captivity: an application of superimposed information , 2001, Proceedings 17th International Conference on Data Engineering.

[53]  Edward A. Fox,et al.  Enhancing Concept Mapping Tools Below and Above to Facilitate the Use of Superimposed Information , 2006 .

[54]  Nicola Ferro,et al.  A formal model of annotations of digital content , 2007, TOIS.

[55]  Morgan V. Cundiff An introduction to the Metadata Encoding and Transmission Standard (METS) , 2004 .

[56]  Xiao Li,et al.  Extracting structured information from user queries with semi-supervised conditional random fields , 2009, SIGIR.

[57]  Lois M. L. Delcambre,et al.  Querying bi-level information , 2004, WebDB '04.

[58]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[59]  Fatos T. Yarman-Vural,et al.  SASI: a new texture descriptor for content based image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[60]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[61]  Herbert Van de Sompel,et al.  Using MPEG-21 DIDL to Represent Complex Digital Objects in the Los Alamos National Laboratory Digital Library , 2003, D Lib Mag..

[62]  Frederico T. Fonseca,et al.  Semantic Granularity in Ontology-Driven Geographic Information Systems , 2002, Annals of Mathematics and Artificial Intelligence.

[63]  Edward A. Fox,et al.  5SQual: a quality assessment tool for digital libraries , 2007, JCDL '07.

[64]  Nicola Orio,et al.  Annotating illuminated manuscripts: an effective tool for research and education , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[65]  Lois M. L. Delcambre,et al.  Superimposed applications using SPARCE , 2004, Proceedings. 20th International Conference on Data Engineering.

[66]  Lillian N. Cassel,et al.  Building a search engine for computer science course syllabi , 2013, JCDL '13.

[67]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[68]  Gordon W. Paynter,et al.  Predicting Library of Congress classifications from Library of Congress subject headings , 2004, J. Assoc. Inf. Sci. Technol..

[69]  Mary Baker,et al.  The LOCKSS peer-to-peer digital preservation system , 2005, TOCS.

[70]  Michael Uschold,et al.  Ontologies and semantics for seamless connectivity , 2004, SGMD.

[71]  Veda C. Storey,et al.  Ontology Creation: Extraction of Domain Knowledge from Web Documents , 2005, ER.

[72]  Mimi Recker,et al.  Using resources across educational digital libraries , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[73]  Lois M. L. Delcambre,et al.  Explicitly Representing Superimposed Information in a Conceptual Model , 2006, ER.

[74]  Matteo Romanello,et al.  Citations in the Digital Library of Classics: Extracting Canonical References by Using Conditional Random Fields , 2009 .

[75]  Stuart Weibel,et al.  The Dublin Core Metadata Initiative: Mission, Current Activities, and Future Directions , 2000, D Lib Mag..

[76]  Patrice Lopez Automatic Extraction and Resolution of Bibliographical References in Patent Documents , 2010, IRFC.

[77]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[78]  Nádia P. Kozievitch,et al.  Complex Objects in Digital Libraries , 2009, Bull. IEEE Tech. Comm. Digit. Libr..

[79]  Edward A. Fox,et al.  Integration of complex archeology digital libraries: An ETANA-DL experience , 2008, Inf. Syst..

[80]  Lois M. L. Delcambre,et al.  Putting Integrated Information in Context: Superimposing Conceptual Models with SPARCE , 2004, APCCM.

[81]  Lizhen Liu,et al.  Research of intelligent information retrieval system ontology-based in digital library , 2008, 2008 IEEE International Symposium on IT in Medicine and Education.

[82]  Marianne Winslett,et al.  Assuring security and privacy for digital library transactions on the Web: client and server security policies , 1997, Proceedings of ADL '97 Forum on Research and Technology. Advances in Digital Libraries.

[83]  M.Y. Javed,et al.  A Performance Comparison of Data Encryption Algorithms , 2005, 2005 International Conference on Information and Communication Technologies.

[84]  Asunción Gómez-Pérez,et al.  Building a chemical ontology using Methontology and the Ontology Design Environment , 1999, IEEE Intell. Syst..

[85]  Adam Pease,et al.  IEEE standard upper ontology: a progress report , 2002, The Knowledge Engineering Review.

[86]  Gary Marchionini,et al.  A bucket architecture for the open video project , 2001, JCDL '01.

[87]  Michelangelo Ceci,et al.  Classifying web documents in a hierarchy of categories: a comprehensive study , 2007, Journal of Intelligent Information Systems.

[88]  Dean Rehberger,et al.  Reevaluating Access and Preservation Through Secondary Repositories: Needs, Promises, and Challenges , 2006, ECDL.

[89]  William R. Murray,et al.  Applying Formal Methods and Representations in a Natural Language Tutor to Teach Tactical Reasoning , 2003 .

[90]  Lois M. L. Delcambre,et al.  Superimposed Information Architecture for Digital Libraries , 2008, ECDL.

[91]  Michael R. Genesereth,et al.  Knowledge Interchange Format , 1991, KR.

[92]  Andrew M. Webb,et al.  combinFormation: a mixed-initiative system for representing collections as compositions of image and text surrogates , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[93]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[94]  Herbert Van de Sompel,et al.  The OAI-ORE effort: progress, challenges, synergies , 2007, JCDL '07.

[95]  Sebastian Ryszard Kruk,et al.  JeromeDL: The Social Semantic Digital Library , 2009, Semantic Digital Libraries.

[96]  Jeffrey B. Lotspiech,et al.  Safeguarding Digital Library Contents and Users: Digital Watermarking , 1997, D Lib Mag..

[97]  Claudia Bauzer Medeiros,et al.  Bridging the gap between geospatial resource providers and model developers , 2008, GIS '08.

[98]  Matthias Schmid,et al.  Comparing the usage of digital rights management systems in the music, film, and print industry , 2003, ICEC '03.

[99]  Theodore Y. Ts'o,et al.  Kerberos: an authentication service for computer networks , 1994, IEEE Communications Magazine.

[100]  Edward A. Fox,et al.  Designing Protocols in Support of Digital Library Componentization , 2002, ECDL.

[101]  Asunción Gómez-Pérez,et al.  An overview of methods and tools for ontology learning from texts , 2004, The Knowledge Engineering Review.

[102]  Henry M. Gladney,et al.  Access control for large collections , 1997, TOIS.

[103]  Edward A. Fox,et al.  Enhanced Browsing System for Electronic Theses and Dissertations , 2011 .

[104]  S. V. Nagaraj Access control in distributed object systems: problems with access control lists , 2001, Proceedings Tenth IEEE International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises. WET ICE 2001.

[105]  Alex Delis,et al.  Integrating Multi-dimensional Information Spaces , 2009 .

[106]  Yannis Kalfoglou,et al.  Using Formal Concept Analysis and Information Flow for Modelling and Sharing Common Semantics: Lessons Learnt and Emergent Issues , 2005, ICCS.

[107]  Nicola Ferro,et al.  Towards a Reference Quality Model for Digital Libraries , 2007 .

[108]  Richard Fikes,et al.  The Ontolingua Server: a tool for collaborative ontology construction , 1997, Int. J. Hum. Comput. Stud..

[109]  Seng-Phil Hong,et al.  Access control in collaborative systems , 2005, CSUR.

[110]  Enrico Motta,et al.  ScholOnto: an ontology-based digital library server for research documents and discourse , 2000, International Journal on Digital Libraries.

[111]  Gobinda G. Chowdhury,et al.  Template mining for the extraction of citation from digital documents , 2001 .

[112]  Theodor Holm Nelson,et al.  Xanalogical structure, needed now more than ever: parallel documents, deep links to content, deep versioning, and deep re-use , 1999, CSUR.

[113]  Don Braggins Fingerprint sensing and analysis , 2001 .

[114]  Edward A. Fox,et al.  Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries , 2004, TOIS.

[115]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[116]  Andrew M. Webb,et al.  combinFormation: Mixed-initiative composition of image and text surrogates promotes information discovery , 2008, TOIS.

[117]  Robert Wilensky,et al.  Robust intra-document locations , 2000, Comput. Networks.

[118]  Enric Mor,et al.  Towards personalization in digital libraries through ontologies , 2005 .

[119]  Edward A. Fox,et al.  Visual Semantic Modeling of Digital Libraries , 2003, ECDL.

[120]  Lois M. L. Delcambre,et al.  Models for Superimposed Information , 1999, ER.

[121]  Herbert Van de Sompel,et al.  The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.

[122]  Jie Zou,et al.  Locating and parsing bibliographic references in HTML medical articles , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[123]  André Santanchè,et al.  User-author centered multimedia building blocks , 2006, Multimedia Systems.

[124]  Fabio Corubolo,et al.  Location and Format Independent Distributed Annotations for Collaborative Research , 2007, ECDL.

[125]  Edward A. Fox,et al.  Species identification: fish images with CBIR and annotations , 2009, JCDL '09.

[126]  Enrico Motta,et al.  SemSearch: A Search Engine for the Semantic Web , 2006, EKAW.

[127]  Edward A. Fox,et al.  A Teaching Tool for Parasitology: Enhancing Learning with Annotation and Image Retrieval , 2010, ECDL.

[128]  P. Samarati,et al.  Access control: principle and practice , 1994, IEEE Communications Magazine.

[129]  Edward A. Fox,et al.  Superimposed Image Description and Retrieval for Fish Species Identification , 2009, ECDL.

[130]  Marja-Riitta Koivunen,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.

[131]  Gerd Stumme,et al.  FCA-MERGE: Bottom-Up Merging of Ontologies , 2001, IJCAI.

[132]  Marcos André Gonçalves,et al.  FLUX-CIM: flexible unsupervised extraction of citation metadata , 2007, JCDL '07.

[133]  André Santanchè,et al.  A Component Model and Infrastructure for a Fluid Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[134]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[135]  Mayer D. Schwartz,et al.  The Dexter Hypertext Reference Model , 1994, CACM.

[136]  Rik Van de Walle,et al.  The MPEG-21 Book: Burnett/The MPEG-21 Book , 2006 .

[137]  Catherine C. Marshall,et al.  Annotation: from paper books to the digital library , 1997, DL '97.

[138]  Elisa Bertino,et al.  A Content-Based Authorization Model for Digital Libraries , 2002, IEEE Trans. Knowl. Data Eng..

[139]  Nicola Orio,et al.  DiLAS: a Digital Library Annotation Service , 2005, IWAC.

[140]  Kurt Maly,et al.  Buckets: smart objects for digital libraries , 2001, CACM.

[141]  Padmini Srinivasan,et al.  Hierarchical Text Categorization Using Neural Networks , 2004, Information Retrieval.

[142]  Huan Liu,et al.  Resource description framework: metadata and its applications , 2001, SKDD.

[143]  Shih-Hung Wu,et al.  Reference metadata extraction using a hierarchical knowledge representation framework , 2007, Decis. Support Syst..

[144]  Jean Bacon,et al.  Access control and trust in the use of widely distributed services , 2001, Softw. Pract. Exp..

[145]  Dean B. Krafft,et al.  Ncore: architecture and implementation of a flexible, collaborative digital library , 2008, JCDL '08.

[146]  Paolo Manghi,et al.  Realizing and Maintaining Aggregative Digital Library Systems: D-NET Software Toolkit and OAIster System , 2010, D Lib Mag..

[147]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[148]  Dale Schuurmans,et al.  Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling , 2006, ACL.