Are data repositories fettered? A survey of current practices, challenges and future technologies

PurposeThe purpose of this study is to explore current practices, challenges and technological needs of different data repositories.Design/methodology/approachAn online survey was designed for data repository managers, and contact information from the re3data, a data repository registry, was collected to disseminate the survey.FindingsIn total, 189 responses were received, including 47% discipline specific and 34% institutional data repositories. A total of 71% of the repositories reporting their software used bespoke technical frameworks, with DSpace, EPrint and Dataverse being commonly used by institutional repositories. Of repository managers, 32% reported tracking secondary data reuse while 50% would like to. Among data reuse metrics, citation counts were considered extremely important by the majority, followed by links to the data from other websites and download counts. Despite their perceived usefulness, repository managers struggle to track dataset citations. Most repository managers support dataset and metadata quality checks via librarians, subject specialists or information professionals. A lack of engagement from users and a lack of human resources are the top two challenges, and outreach is the most common motivator mentioned by repositories across all groups. Ensuring findable, accessible, interoperable and reusable (FAIR) data (49%), providing user support for research (36%) and developing best practices (29%) are the top three priorities for repository managers. The main recommendations for future repository systems are as follows: integration and interoperability between data and systems (30%), better research data management (RDM) tools (19%), tools that allow computation without downloading datasets (16%) and automated systems (16%).Originality/valueThis study identifies the current challenges and needs for improving data repository functionalities and user experiences.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-04-2021-0204

[1]  Mike Thelwall,et al.  Measuring the impact of biodiversity datasets: data reuse, citations and altmetrics , 2021, Scientometrics.

[2]  Stacy Konkiel,et al.  Assessing the Impact and Quality of Research Data Using Altmetrics and Other Indicators , 2020, Scholarly Assessment Reports.

[3]  Mike Thelwall,et al.  Identifying Data Sharing and Reuse with Scholix: Potentials and Limitations , 2020, Patterns.

[4]  Barbara McGillivray,et al.  The citation advantage of linking publications to research data , 2019, PloS one.

[5]  C. Waelbroeck,et al.  Consistently dated Atlantic sediment cores over the last 40 thousand years , 2019, Scientific Data.

[6]  Sören Auer,et al.  Towards Semantic Integration of Federated Research Data , 2019, Datenbank-Spektrum.

[7]  Mike Thelwall,et al.  Is useful research data usually shared? An investigation of genome-wide association study summary statistics , 2019, bioRxiv.

[8]  Alastair Dunning,et al.  FAIRness of repositories & their data: a report from LIBER's Research Data Management Working Group , 2019 .

[9]  Ayoung Yoon,et al.  Scientists' data reuse behaviors: A multilevel analysis , 2017, J. Assoc. Inf. Sci. Technol..

[10]  Stéphane Goldstein The Evolving Landscape Of Federated Research Data Infrastructures , 2017 .

[11]  Liz Lyon,et al.  Developments in research data management in academic libraries: Towards an understanding of research data service maturity , 2017, J. Assoc. Inf. Sci. Technol..

[12]  Jöran Beel,et al.  RARD: The Related-Article Recommendation Dataset , 2017, D Lib Mag..

[13]  George A. Mensah,et al.  Use of the National Heart, Lung, and Blood Institute Data Repository , 2017, The New England journal of medicine.

[14]  Christine L. Borgman,et al.  On the Reuse of Scientific Data , 2017, Data Sci. J..

[15]  Lily Troia,et al.  A Data Citation Roadmap for Scholarly Data Repositories , 2017 .

[16]  Paolo Manghi,et al.  The Scholix Framework for Interoperability in Data-Literature Information Exchange , 2017, D Lib Mag..

[17]  Jordan M. Malof,et al.  Distributed solar photovoltaic array location and extent dataset for remote sensing object identification , 2016, Scientific Data.

[18]  Ayoung Yoon,et al.  Red flags in data: Learning from failed data reuse experiences , 2016, ASIST.

[19]  Elizabeth Yakel,et al.  Social scientists' satisfaction with data reuse , 2016, J. Assoc. Inf. Sci. Technol..

[20]  Donatella Castelli,et al.  Are Scientific Data Repositories Coping with Research Data Publishing? , 2016, Data Sci. J..

[21]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[22]  Evaristo Jiménez-Contreras,et al.  Analyzing data citation practices using the data citation index , 2015, J. Assoc. Inf. Sci. Technol..

[23]  Carly Strasser,et al.  Making data count , 2015, Scientific data.

[24]  Douglas J. Joubert,et al.  Biomedical Data Sharing and Reuse: Attitudes and Practices of Clinical and Scientific Research Staff , 2015, PloS one.

[25]  S. Pinfield,et al.  Research Data Management and Libraries: Relationships, Activities, Drivers and Influences , 2014, PloS one.

[26]  Heinz Pampel,et al.  re3data.org – REgistry of REsearch data REpositories , 2015 .

[27]  Peter Schirmbacher,et al.  Making Research Data Repositories Visible: The re3data.org Registry , 2013, PloS one.

[28]  C. Borgman,et al.  If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology , 2013, PloS one.

[29]  Michael Witt,et al.  Data sharing, small science and institutional repositories , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[30]  Arlene Fink,et al.  How To Design Survey Studies , 2002 .

[31]  George Hripcsak,et al.  Measuring agreement in medical informatics reliability studies , 2002, J. Biomed. Informatics.