Assessing the Impact and Quality of Research Data Using Altmetrics and Other Indicators

Research data in all its diversity—instrument readouts, observations, images, texts, video and audio files, and so on—is the basis for most advancement in the sciences. Yet the assessment of most research programmes happens at the publication level, and data has yet to be treated like a first-class research object. How can and should the research community use indicators to understand the quality and many potential impacts of research data? In this article, we discuss the research into research data metrics, these metrics’ strengths and limitations with regard to formal evaluation practices, and the possible meanings of such indicators. We acknowledge the dearth of guidance for using altmetrics and other indicators when assessing the impact and quality of research data, and suggest heuristics for policymakers and evaluators interested in doing so, in the absence of formal governmental or disciplinary policies. Policy highlights Research data is an important building block of scientific production, but efforts to develop a framework for assessing data’s impacts have had limited success to date. Indicators like citations, altmetrics, usage statistics, and reuse metrics highlight the influence of research data upon other researchers and the public, to varying degrees. In the absence of a shared definition of “quality”, varying metrics may be used to measure a dataset’s accuracy, currency, completeness, and consistency. Policymakers interested in setting standards for assessing research data using indicators should take into account indicator availability and disciplinary variations in the data when creating guidelines for explaining and interpreting research data’s impact. Quality metrics are context dependent: they may vary based upon discipline, data structure, and repository. For this reason, there is no agreed upon set of indicators that can be used to measure quality. Citations are well-suited to showcase research impact and are the most widely understood indicator. However, efforts to standardize and promote data citation practices have seen limited success, leading to varying rates of citation data availability across disciplines. Altmetrics can help illustrate public interest in research, but availability of altmetrics for research data is very limited. Usage statistics are typically understood to showcase interest in research data, but infrastructure to standardize these measures have only recently been introduced, and not all repositories report their usage metrics to centralized data brokers like DataCite. Reuse metrics vary widely in terms of what kinds of reuse they measure (e.g. educational, scholarly, etc). This category of indicator has the fewest heuristics for collection and use associated with it; think about explaining and interpreting reuse with qualitative data, wherever possible. All research data impact indicators should be interpreted in line with the Leiden Manifesto’s principles, including accounting for disciplinary variation and data availability. Assessing research data impact and quality using numeric indicators is not yet widely practiced, though there is generally support for the practice amongst researchers.

[1]  Mark Hahnel,et al.  Referencing: The reuse factor , 2013, Nature.

[2]  Norbert Lossau,et al.  Evaluation of Research Careers fully acknowledging Open Science Practices - Rewards, incentives and/or recognition for researchers practicing Open Science , 2017 .

[3]  Mike Thelwall,et al.  Evaluating altmetrics , 2013, Scientometrics.

[4]  Wei Jeng,et al.  Incorporating data sharing to the reward system of science: Linking DataCite records to authors in the Web of Science , 2017, Aslib J. Inf. Manag..

[5]  Andrea Giovanni Nuzzolese,et al.  Do altmetrics work for assessing research quality? , 2018, Scientometrics.

[6]  Vincent Larivière,et al.  Tweets as impact indicators: Examining the implications of automated “bot” accounts on Twitter , 2014, J. Assoc. Inf. Sci. Technol..

[7]  Anany Levitin,et al.  The Notion of Data and Its Quality Dimensions , 1994, Inf. Process. Manag..

[8]  Mercè Crosas,et al.  Data Authorship as an Incentive to Data Sharing. , 2017, The New England journal of medicine.

[9]  Lutz Bornmann,et al.  Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics , 2014, J. Informetrics.

[10]  Marc Buyse,et al.  Data fraud in clinical trials. , 2015, Clinical investigation.

[11]  Paul Wouters,et al.  Evaluation practices and effects of indicator use : a literature review , 2016 .

[12]  Johan Bollen,et al.  Usage bibliometrics , 2011, Annu. Rev. Inf. Sci. Technol..

[13]  B. Björk,et al.  Anatomy of open access publishing: a study of longitudinal development and internal structure , 2012, BMC Medicine.

[14]  Paolo Missier,et al.  Data trajectories: tracking reuse of published data for transitive credit attribution , 2016, Int. J. Digit. Curation.

[15]  Stacy Konkiel,et al.  Altmetrics: diversifying the understanding of influential scholarship , 2016, Palgrave Communications.

[16]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[17]  Ross Mounce Open access and altmetrics: Distinct but complementary , 2013 .

[18]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[19]  Stacy Konkiel,et al.  New opportunities for repositories in the age of altmetrics , 2013 .

[20]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[21]  David A. Clunie,et al.  Image Data Sharing for Biomedical Research—Meeting HIPAA Requirements for De-identification , 2012, Journal of Digital Imaging.

[22]  S. Rijcke,et al.  Bibliometrics: The Leiden Manifesto for research metrics , 2015, Nature.

[23]  Brett Wilson,et al.  Detecting Data Quality Issues in Clinical Trials: Current Practices and Recommendations , 2016, Therapeutic innovation & regulatory science.

[24]  Wei Jeng,et al.  DataCite as a novel bibliometric source: Coverage, strengths and limitations , 2017, J. Informetrics.

[25]  Tobias Siebenlist,et al.  Cross-metric compatability and inconsistencies of altmetrics , 2018, Scientometrics.

[26]  Wolfgang Glänzel,et al.  Usage metrics versus altmetrics: confusing terminology? , 2015, Scientometrics.

[27]  Mengnan Zhao,et al.  Data set mentions and citations: A content analysis of full‐text publications , 2018, J. Assoc. Inf. Sci. Technol..

[28]  Peter Kraker,et al.  Research data explored: an extended analysis of citations and altmetrics , 2016, Scientometrics.

[29]  Sarah Callaghan,et al.  Citation and Peer Review of Data: Moving Towards Formal Data Publication , 2011, Int. J. Digit. Curation.

[30]  Marios D. Dikaiakos,et al.  Web robot detection: A probabilistic reasoning approach , 2009, Comput. Networks.

[31]  Bohyun Kim Chapter 4. Gamification in Education and Libraries , 2015 .

[32]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[33]  Heather A. Piwowar,et al.  Altmetrics: Value all research products , 2013, Nature.

[34]  Jafar S. Jabbari,et al.  Single cell RNA sequencing of stem cell-derived retinal ganglion cells , 2018, Scientific Data.

[35]  William C. Bennett,et al.  Author Correction: Imaging and clinical data archive for head and neck squamous cell carcinoma patients treated with radiotherapy , 2018, Scientific Data.

[36]  Micah Altman,et al.  An introduction to the joint principles for data citation , 2015 .

[37]  Thea Marie Drachen,et al.  Sharing data increases citations , 2016 .

[38]  Christine L Borgman,et al.  Why are the attribution and citation of scientific data important? In: Uhlir, Paul and Cohen, Daniel (eds.). Report from Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop. , 2012 .

[39]  Matthew S. Mayernik,et al.  Peer Review of Datasets: When, Why, and How , 2015 .

[40]  Sarah Callaghan,et al.  Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmental Data Centres , 2012, Int. J. Digit. Curation.

[41]  Mike Jackson,et al.  On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for Environmental Studies , 2017, ISPRS Int. J. Geo Inf..

[42]  Hyoungjoo Park,et al.  Informal data citation for data sharing and reuse is more common than formal data citation in biomedical fields , 2018, J. Assoc. Inf. Sci. Technol..

[43]  D. Altman,et al.  Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers , 2010, BMJ : British Medical Journal.

[44]  Stacy Konkiel,et al.  Tracking citations and altmetrics for research data: Challenges and opportunities , 2013 .

[45]  Gianmaria Silvello,et al.  Theory and practice of data citation , 2017, J. Assoc. Inf. Sci. Technol..