Scientific data repositories on the Web: An initial survey

Science Data Repositories (SDRs) have been recognized as both critical to science, and undergoing a fundamental change. A websample study was conducted of 100 SDRs. Information on the websites and from administrators of the SDRs was reviewed to determine salient characteristics of the SDRs, which were used to classify SDRs into groups using a combination of cluster analysis and logistic regression. Characteristics of the SDRs were explored for their role in determining groupings and for their relationship to the success of SDRs. Four of these characteristics were identified as important for further investigation: whether the SDR was supported with grants and contracts, whether support comes from multiple sponsors, what the holding size of the SDR is and whether a preservation policy exists for the SDR. An inferential framework for understanding SDR composition, guided by observations, characteristic collection and refinement and subsequent analysis on elements of group membership, is discussed. The development of SDRs is further examined from a business standpoint, and in comparison to its most similar form, institutional repositories. Because this work identifies important characteristics of SDRs and which characteristics potentially impact the sustainability and success of SDRs, it is expected to be helpful to SDRs. © 2010 Wiley Periodicals, Inc.

[1]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[2]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[3]  Marcia Lei Zeng,et al.  Metadata Interoperability and Standardization - A Study of Methodology, Part I: Achieving Interoperability at the Schema Level , 2006, D Lib Mag..

[4]  Marcia Lei Zeng,et al.  Metadata Interoperability and Standardization - A Study of Methodology, Part II: Achieving Interoperability at the Record and Repository Levels , 2006, D Lib Mag..

[5]  Clifford A. Lynch,et al.  Institutional Repositories: Essential Infrastructure For Scholarship In The Digital Age , 2003 .

[6]  Paul A. David,et al.  Towards a cyberinfrastructure for enhanced scientific collaboration: Providing its 'soft' foundations may be the hardest part , 2006 .

[7]  Jake Carlson,et al.  A Subject Librarian’s Guide to Collaborating on e-Science Projects , 2009 .

[8]  Reagan Moore,et al.  Universal view and open policy: Paradigms for collaboration in data grids , 2009, 2009 International Symposium on Collaborative Technologies and Systems.

[9]  Dawn Schmitz,et al.  The Seamless Cyberinfrastructure: The Challenges of Studying Users of Mass Digitization and Institutional Repositories , 2008 .

[10]  Reagan Moore,et al.  Data grids, collections, and grid bricks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[11]  Norbert Lossau DRIVER : Networking European Scientific Repositories , 2006 .

[12]  Michael Y. Galperin,et al.  The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources , 2009, Nucleic Acids Res..

[13]  Paolo Manghi,et al.  Digital Repository Infrastructure Vision for European Research , 2009, IRCDL.

[14]  C. Lynch Big data: How do your data grow? , 2008, Nature.

[15]  Helena Karasti,et al.  Digital Data Practices and the Long Term Ecological Research Program Growing Global , 2008, Int. J. Digit. Curation.

[16]  Noel Enyedy,et al.  Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries , 2007, International Journal on Digital Libraries.

[17]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.