Epidemiology experimentation and simulation management through scientific digital libraries

Advances in scientific data management, discovery, dissemination, and sharing are changing the manner in which scientific studies are being conducted and repurposed. Data-intensive scientific practices increasingly require data management related services not available in existing digital libraries. Complicating the issue are the diversity of functional requirements and content in scientific domains as well as scientists’ lack of expertise in information and library science. Researchers that utilize simulation and experimentation systems need digital libraries to maintain datasets, input configurations, results, analyses, and related documents. A digital library may be integrated with simulation infrastructures to provide automated support for research components, e.g., simulation interfaces to models, data warehouses, simulation applications, computational resources, and storage systems. Managing and provisioning simulation content allows streamlined experimentation, collaboration, discovery, and content reuse within a simulation community. Formal definitions of this class of digital libraries provide a foundation for producing a software toolkit and the semi-automated generation of digital library instances. We present a generic, component-based SIMulation-supporting Digital Library (SimDL) framework. The framework is formally described and provides a deployable set of domain-free services, schema-based domain knowledge representations, and extensible lower and higher level service abstractions. Services in SimDL are specialized for semi-structured simulation content and large-scale data producing infrastructures, as exemplified in data storage, indexing, and retrieval service implementations. Contributions to the scientific community include previously unavailable simulation-specific services, e.g., incentivizing public contributions, semi-automated content curating, and memoizing simulation-generated data products. The practicality of SimDL is demonstrated through several case studies in computational epidemiology and network science as well as performance evaluations.

[1]  Stuart Weibel,et al.  The Dublin Core: A Simple Content Description Model for Electronic Resources , 2005 .

[2]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[3]  Raymond A. Lorie,et al.  Long term preservation of digital information , 2001, JCDL '01.

[4]  Nick Nicholas,et al.  ARCHER - e-Research Tools for Research Data Management , 2009, Int. J. Digit. Curation.

[5]  Carole L. Palmer,et al.  Graduate Curriculum for Biological Information Specialists: A Key to Integration of Scale in Biology , 2007, Int. J. Digit. Curation.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  David C. Hay,et al.  Data model patterns : a metadata map , 2006 .

[8]  Shiyali Ramamrita Ranganathan,et al.  The Five Laws of Library Science , 1948 .

[9]  Sandra Payette,et al.  Pathways: augmenting interoperability across scholarly repositories , 2007, International Journal on Digital Libraries.

[10]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[11]  Edward A. Fox,et al.  SimDL: a model ontology driven digital library for simulation systems , 2011, JCDL '11.

[12]  Seungwon Yang,et al.  ETDs, NDLTD e acesso aberto: uma perspectiva 5S , 2006 .

[13]  Kathleen Fear "You made it, you take care of it": Data Management as Personal Information Management , 2011, Int. J. Digit. Curation.

[14]  Douglas Christopher Gorton,et al.  Practical Digital Library Generation into DSpace with the 5S Framework , 2007 .

[15]  David Groenewegen,et al.  The Data Curation Continuum: Managing Data Objects in Institutional Repositories , 2007, D Lib Mag..

[16]  Kellie Snow,et al.  Making Sense: Talking Data Management with Researchers , 2011, Int. J. Digit. Curation.

[17]  Andrew E. Treloar Design and Implementation of the Australian National Data Service , 2009, Int. J. Digit. Curation.

[18]  Zhiyong Guo,et al.  A Cooperative Service Model for Digital Library Alliances Based on Grid , 2010, 2010 International Conference on Machine Vision and Human-machine Interface.

[19]  Helena Karasti,et al.  Digital Data Practices and the Long Term Ecological Research Program Growing Global , 2008, Int. J. Digit. Curation.

[20]  Antoni Wolski,et al.  A Self-Managing High-Availability Database: Industrial Case Study , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[21]  Norbert Fuhr,et al.  Digital Libraries: A Generic Classification and Evaluation Scheme , 2001, ECDL.

[22]  Ulla Bøgvad Kejser,et al.  Cost Model for Digital Preservation: Cost of Digital Migration , 2011, Int. J. Digit. Curation.

[23]  Art Rhyno Using Open Source Systems for Digital Libraries , 2003 .

[24]  Patricia Hswe,et al.  Joining in the Enterprise of Response in the Wake of the NSF Data Management Planning Requirement (Feb. 2011) , 2011 .

[25]  Tobias Schreck,et al.  Content-based layouts for exploratory metadata search in scientific research data , 2012, JCDL '12.

[26]  Sarah Louise Timm The Generation and Management of Museum-Centered Geologic Materials and Information , 2012 .

[27]  Noel Enyedy,et al.  Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries , 2007, International Journal on Digital Libraries.

[28]  Neil F. Abernethy,et al.  Linking information systems for HIV care and research in Kenya , 2010, IHI.

[29]  Les Carr,et al.  Institutional data management blueprint final report , 2011 .

[30]  Edward A. Fox,et al.  5SGraph demo: a graphical modeling tool for digital libraries , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[31]  Thierry Oscar Edoh,et al.  EPharmacyNet: an approach to improve the pharmaceutical care delivery in developing countries-study case-BENIN , 2010, IHI.

[32]  Kai-Uwe Sattler,et al.  Towards Indexing Schemes for Self-Tuning DBMS , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[33]  Yogesh L. Simmhan,et al.  The Open Provenance Model (v1.01) , 2008 .

[34]  C. Lee Giles,et al.  Similar researcher search in academic environments , 2012, JCDL '12.

[35]  Hans Hagen,et al.  Scientific Visualization: Overviews, Methodologies, and Techniques , 1997 .

[36]  Norman Gray,et al.  Managing Research Data: Gravitational Waves , 2012 .

[37]  Leslie M. Delserone At the Watershed: Preparing for Research Data Management and Stewardship at the University of Minnesota Libraries , 2008, Libr. Trends.

[38]  Martin Donnelly,et al.  The Milieu and the MESSAGE: Talking to Researchers about Data Curation Issues in a Large and Diverse e-Science Project , 2011, Int. J. Digit. Curation.

[39]  Madhav V. Marathe,et al.  Subgraph Enumeration in Large Social Contact Networks Using Parallel Color Coding and Streaming , 2010, 2010 39th International Conference on Parallel Processing.

[40]  Micah Altman,et al.  From Preserving the Past to Preserving the Future: The Data-PASS Project and the Challenges of Preserving Digital Social Science Data , 2009, Libr. Trends.

[41]  Sarah Jones,et al.  DMP Online: A Demonstration of the Digital Curation Centre's Web-Based Tool for Creating, Maintaining and Exporting Data Management Plans , 2010, ECDL.

[42]  Seonho Kim,et al.  ETDs, NDLTD, and open access: a 5S perspective , 2006, Ciência da Informação.

[43]  Hans E. Roosendaal,et al.  Forces and functions in scientific communication , 1997 .

[44]  Catherine Soehner,et al.  E-science and data support services: a study of ARL member institutions , 2010 .

[45]  Mary Larsgaard,et al.  The National Geospatial Digital Archives—Collection Development: Lessons Learned , 2009, Libr. Trends.

[46]  Edward A. Fox,et al.  Evaluating Digital Libraries with 5SQual , 2007, ECDL.

[47]  Marcia Lei Zeng,et al.  Metadata Decisions for Digital Libraries: A Survey Report , 2009 .

[48]  Raquel Hontecillas,et al.  Model of colonic inflammation: immune modulatory mechanisms in inflammatory bowel disease. , 2010, Journal of theoretical biology.

[49]  Chunxiao Xing,et al.  A Cooperative Framework of Service Chain for Digital Library , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[50]  M. Marathe,et al.  Economic and social impact of influenza mitigation strategies by demographic class. , 2011, Epidemics.

[51]  Harry E. Pence,et al.  Enhancing learning with online resources, social networking, and digital libraries , 2010 .

[52]  Thomas Robertson,et al.  Requirements for Digital Preservation Systems , 2010 .

[53]  Jeff Haywood,et al.  Research Data Management Initiatives at University of Edinburgh , 2011, Int. J. Digit. Curation.

[54]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[55]  Na Li,et al.  oreChem ChemXSeer: a semantic digital library for chemistry , 2010, JCDL '10.

[56]  B. B. Chaudhuri Digital document processing : major directions and recent advances , 2006 .

[57]  Nicholas Joint Data preservation, the new science and the practitioner librarian , 2007 .

[58]  Edward A. Fox,et al.  Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries , 2004, TOIS.

[59]  Madhav V. Marathe,et al.  Guiding Health Care Policy through Applied Public Health Modeling and Simulation , 2011 .

[60]  Amanda Ross,et al.  Mathematical modeling of the impact of malaria vaccines on the clinical epidemiology and natural history of Plasmodium falciparum malaria: Overview. , 2006, The American journal of tropical medicine and hygiene.

[61]  Hollie White,et al.  A Metadata Best Practice for a Scientific Data Repository , 2009 .

[62]  Edward A. Fox,et al.  NDLTD: Preparing the next generation of scholars for the information age , 1997 .

[63]  Gregory S. Hunter Preserving Digital Information : A How-To-Do-It Manual , 2000 .

[64]  Jack M. Maness,et al.  Receptivity to Library Involvement in Scientific Data Curation: A Case Study at the University of Colorado Boulder , 2011 .

[65]  Alexander S. Szalay,et al.  Online scientific data curation, publication, and archiving , 2002, SPIE Astronomical Telescopes + Instrumentation.

[66]  Edward A. Fox,et al.  International Journal on Digital Libraries manuscript No. (will be inserted by the editor) A Digital Library Framework for Biodiversity Information Systems , 2022 .

[67]  Carl Lagoze,et al.  NCSTRL: Design and deployment of a globally distributed digital library , 2000, J. Am. Soc. Inf. Sci..

[68]  Jitka Hurych National Digital Preservation Initiatives: An Overview of Developments in Australia, France, The Netherlands and the United Kingdom, and of Related International Activity , 2013 .

[69]  Heiko Schuldt,et al.  On-Demand Service Deployment and Process Support in e-Science DLs:the DILIGENT Experience , 2006 .

[70]  Stuart Macdonald,et al.  Collaboration to Data Curation: Harnessing Institutional Expertise , 2010 .

[71]  Edward A. Fox,et al.  Extending the 5S Digital Library Framework: From a Minimal DL Towards a DL Reference Model , 2007 .

[72]  Daniel Greenstein,et al.  The digital library: A biography , 2002 .

[73]  P. Bryan Heidorn,et al.  Shedding Light on the Dark Data in the Long Tail of Science , 2008, Libr. Trends.

[74]  Jiming Liu,et al.  Effective epidemic control via strategic vaccine deployment: a systematic approach , 2010, IHI.

[75]  Graham Pryor Attitudes and Aspirations in a Diverse World: The Project StORe Perspective on Scientific Repositories , 2007, Int. J. Digit. Curation.

[76]  Sandra Payette,et al.  Flexible and Extensible Digital Object and Repository Architecture (FEDORA) , 1998, ECDL.

[77]  Cecelia DeLuca,et al.  Earth system curator: metadata infrastructure for climate modeling , 2008, Earth Sci. Informatics.

[78]  Tracy Gabridge The Last Mile: Liaison Roles in Curating Science and Engineering Research Data , 2009 .

[79]  Marcos André Gonçalves Streams, Structures, Spaces,Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications , 2004 .

[80]  Marisa Ramirez Opinion: Whose role is it anyway? : A library practitioner's appraisal of the digital data deluge , 2011 .

[81]  Jonathan Leidig Digital library support for public health simulation infrastructures , 2012, Bull. IEEE Tech. Comm. Digit. Libr..

[82]  B. K. Ghosh,et al.  Simulation Using Promodel , 2000 .

[83]  Douglas Campbell,et al.  Identifying the Identifiers , 2007, Dublin Core Conference.

[84]  Ross Harvey Digital Curation: A How-To-Do-It Manual , 2010 .

[85]  P. Geurts,et al.  Forces and functions in scientific communication: an analysis of their interplay , 1997 .

[86]  Anita R. Dryden,et al.  Assessing the Academic Library's Role in Campus-Wide Research Data Management: A First Step at the University of Houston , 2011 .

[87]  William R. Hersh,et al.  Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries , 2002 .

[88]  P. Bryan Heidorn,et al.  The Emerging Role of Libraries in Data Curation and E-science , 2011 .

[89]  Edward A. Fox,et al.  Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) Approach , 2012, Theoretical Foundations for Digital Libraries.

[90]  Michael S. Hsiao,et al.  Experiment and Analysis Services in a Fingerprint Digital Library for Collaborative Research , 2011, TPDL.

[91]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[92]  Madhav V. Marathe,et al.  Estimating the Impact of Public and Private Strategies for Controlling an Epidemic: A Multi-Agent Approach , 2009, IAAI.

[93]  Dalia Guerreiro,et al.  Research and Advanced Technology for Digital Libraries , 1997, Lecture Notes in Computer Science.

[94]  Sergio Camorlinga,et al.  An information and communication technology system to support rural healthcare delivery , 2010, IHI.

[95]  Anne E. Trefethen,et al.  The Data Deluge: An e-Science Perspective , 2003 .

[96]  Madhav V. Marathe,et al.  EpiNet: a simulation framework to study the spread of malware in wireless networks , 2009, SimuTools.

[97]  Edward A. Fox,et al.  5SL: a language for declarative specification and generation of digital libraries , 2002, JCDL '02.

[98]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[99]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[100]  Ian H. Witten,et al.  How to Build a Digital Library , 2002 .

[101]  Ross Wilkinson,et al.  Discovering Australia's research data , 2010, JCDL '10.

[102]  Steven Newhouse,et al.  User Priorities for Data: Results from SUPER , 2007, Int. J. Digit. Curation.

[103]  William L. Anderson Some challenges and issues in managing, and preserving access to, long-lived collections of digital scientific and technical data , 2004, Data Sci. J..

[104]  Volker Linnemann,et al.  Autonomous Index Optimization in XML Databases , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[105]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[106]  Edward A. Fox,et al.  Requirements Gathering and Modeling of Domain-Specific Digital Libraries with the 5S Framework: An Archaeological Case Study with ETANA , 2005, ECDL.

[107]  D. Kaplan The Stanley Milgram Papers: A Case Study on Appraisal of and Access to Confidential Data Files , 2009 .

[108]  Matthew S. Mayernik,et al.  Drowning in data: digital library architecture to support scientific use of embedded sensor networks , 2007, JCDL '07.

[109]  Matthew S. Mayernik,et al.  Moving Archival Practices Upstream: An Exploration of the Life Cycle of Ecological Sensing Data in Collaborative Field Research , 2008, Int. J. Digit. Curation.

[110]  Madhav V. Marathe,et al.  An Integrated Modeling Environment to Study the Co-evolution of Networks, Individual Behavior and Epidemics , 2010, AI Mag..

[111]  Panos Constantopoulos,et al.  Research and Advanced Technology for Digital Libraries , 2001, Lecture Notes in Computer Science.

[112]  Madhav V. Marathe,et al.  EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems , 2009, ICS.

[113]  Rob Procter,et al.  Development of a Pilot Data Management Infrastructure for Biomedical Researchers at University of Manchester - Approach, Findings, Challenges and Outlook of the MaDAM Project , 2012, Int. J. Digit. Curation.

[114]  B. Library Patterns of information use and exchange: case studies of researchers in the life sciences , 2009 .

[115]  Madhav V. Marathe,et al.  EpiNet: a simulation framework to study the spread of malware in wireless networks , 2009, SIMUTools 2009.

[116]  Fuat Akal,et al.  DILIGENT: integrating digital library and Grid technologies for a new Earth observation research infrastructure , 2007, International Journal on Digital Libraries.

[117]  Edward A. Fox,et al.  Towards a digital library theory: a formal digital library ontology , 2008, International Journal on Digital Libraries.

[118]  Christine L. Borgman,et al.  What are Digital Libraries? Competing Visions , 1999, Inf. Process. Manag..

[119]  Tefko Saracevic,et al.  Digital Library Evaluation: Toward Evolution of Concepts , 2000, Libr. Trends.

[120]  Yolanda Gil,et al.  Pegasus and the Pulsar Search: From Metadata to Execution on the Grid , 2003, PPAM.

[121]  Ross Wilkinson,et al.  Access to Data for eResearch: Designing the Australian National Data Service Discovery Services , 2008, Int. J. Digit. Curation.

[122]  Alex Szalay Towards a National Virtual Observatory , 1998 .

[123]  Michael Witt Institutional Repositories and Research Data Curation in a Distributed Environment , 2008, Libr. Trends.

[124]  Edward A. Fox,et al.  Simulation Tools for Producing Metadata Description Sets Covering Simulation-based Content Collections , 2011 .

[125]  John L. Faundeen The challenge of archiving and preserving remotely sensed data , 2003, Data Sci. J..

[126]  Michael Witt,et al.  Data sharing, small science and institutional repositories , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[127]  Alexander Ball,et al.  Challenges and Issues Relating to the Use of Representation Information for the Digital Curation of Crystallography and Engineering Data , 2008, Int. J. Digit. Curation.

[128]  William Foster Digital Libraries: Integrating Content and Systems , 2007, Program.

[129]  Johannes Jm Velterop,et al.  Keeping the Minutes of Science , 1995 .

[130]  Edward A. Fox,et al.  Superimposed Image Description and Retrieval for Fish Species Identification , 2009, ECDL.

[131]  Jin Zhao,et al.  Math information retrieval: user requirements and prototype implementation , 2008, JCDL '08.

[132]  Sriram V. Pemmaraju,et al.  Modeling and estimating the spatial distribution of healthcare workers , 2010, IHI.

[133]  Predrag Knezevic,et al.  A Self-organizing Data Store for Large Scale Distributed Infrastructures , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[134]  Constance Likonelo Bitso Metadata for Digital Collections: A How‐to‐do‐it Manual , 2012 .

[135]  Paolo Manghi,et al.  An Extensible Virtual Digital Libraries Generator , 2008, ECDL.