论文信息 - Separate authorship categories to recognize data collectors and code developers

Separate authorship categories to recognize data collectors and code developers

To the Editor — The current, authorshipbased system for recognizing individual contributions to science only patchily recognizes the contributions of the primary data collection that underpins, and code development that supports, the entire discipline. Data collectors and code developers — scientific resource generators — are progressively being forced to donate the grant income and time and effort of generating, curating and documenting data and code to the discipline as a whole1–3. Yet resource users — those that re-use previously published data and codes to generate new knowledge and publications — benefit from that time and effort but are not required to recognize it in any standardized manner. We need a new way to quantify and value what is currently anonymous; the fundamental contribution to scientific progress that generating scientific resources provides. Many scientists agree that authorship is the ultimate reward for collecting data or developing code. However, the Vancouver Protocol tellingly states that “Participation solely in the ... collection of data does not justify authorship.” Citations are routinely raised as the obvious approach to solving this dilemma4,5, but it is not enough. Citations carry less value to a scientist than authorship. Moreover, citations to scientific resources are agnostic to the impact of the papers that used those resources, resource citations are commonly buried in supplementary material where they do not get picked up by citation tracking software, and published resources not associated with a published manuscript do not contribute to a scientists’ citation indices. We suggest one solution is to divorce authorship of a manuscript from authorship of the resources used in the manuscript, which can be achieved by creating separate categories of authorship: manuscript and resource authors. Here, a published paper would come with two separate author lists. Manuscript authors are those who developed the question, analysed and interpreted the data, and wrote the paper; “authorship for authors”6. Resource authors are those who contributed some or all of the data that were analysed or code that was used. In this system, a resource generator can receive credit for contributing to a paper, but without implying that they agree with, understand, or have even seen, the analysis and the conclusions the manuscript authors have presented. Membership of the two author lists need not be mutually exclusive, as a single person could reasonably contribute resources and contribute to the manuscript. The set of resource authors from a publication presenting new data or code would be repeated on any subsequent publication(s) re-using those resources, whereas the manuscript authors would change to reflect the identity of team members conducting the new analysis. This approach extends naturally to meta-analyses. The set of resource authors on a meta-analysis would include the resource — not manuscript — authors from publications presenting the original data, along with the authors of unpublished datasets or datasets published in online repositories. Manuscript authorship on a meta-analysis would be restricted to those that conducted the analysis and developed the publication. Resource authorship provides a path to quantify the value of a scientist’s provision of resources to the wider community, and could be implemented within the framework of the existing, citation-based recognition system. Resource contributions could reasonably be tracked through the use of exactly the same citation indices already in widespread use, but applied to resource rather than manuscript authorship. This would ensure scientists contributing data or code that are frequently re-used in highly cited, influential papers will have higher resource citation metrics than those contributing resources that are infrequently used and published in low-impact papers. Separating the impact of generating scientific resources from the impact of using those resources provides a way out of the resource generator–resource user tension. The two are complementary aspects of a shared scientific enterprise. Data and reproducible codes represent empirical truth; quantitative, repeatable measurements of the world around us against which we test our understanding. The papers we write are our qualitative interpretation of what those data and codes tell us; they are ephemeral position statements that implicitly embed the sum of our experiences, knowledge and biases to date. Both are important contributions to the advancement of science, and both need to be represented when quantifying the contribution that individuals make to that advance. ❐

Robert M. Ewers | Carsten Rahbek | Cristina Banks-Leite | Jos Barlow

[1] R. Peng. Reproducible Research in Computational Science , 2011, Science.

[2] K. A. S. Mislan,et al. Elevating the status of code in ecology , 2015, bioRxiv.

[3] Tobias I. Baskin. Keep authorship for writers , 2018, Nature.

[4] Rick L. Stevens,et al. Toward unrestricted use of public genomic data , 2019, Science.

[5] Barbara E. Bierer,et al. Credit data generators for data reuse , 2019, Nature.

[6] M. Whitlock. Data archiving in ecology and evolution: best practices. , 2011, Trends in ecology & evolution.