A FAIR guide for data providers to maximise sharing of human genomic data

It is generally acknowledged that, for reproducibility and progress of human genomic research, data sharing is critical. For every sharing transaction, a successful data exchange is produced between a data consumer and a data provider. Providers of human genomic data (e.g., publicly or privately funded repositories and data archives) fulfil their social contract with data donors when their shareable data conforms to FAIR (findable, accessible, interoperable, reusable) principles. Based on our experiences via Repositive (https://repositive.io), a leading discovery platform cataloguing all shared human genomic datasets, we propose guidelines for data providers wishing to maximise their shared data’s FAIRness.

[1]  Allyson L. Lister,et al.  BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences , 2016, Database J. Biol. Databases Curation.

[2]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[3]  Chris Morris,et al.  Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data , 2017, bioRxiv.

[4]  Sirpa Soini,et al.  Genetic testing legislation in Western Europe—a fluctuating regulatory target , 2012, Journal of Community Genetics.

[5]  Laura Lyman Rodriguez,et al.  The dbGaP data browser: a new tool for browsing dbGaP controlled-access genomic data , 2016, Nucleic Acids Res..

[6]  K. Yamamoto,et al.  GLOBAL ALLIANCE FOR GENOMICS AND HEALTH , 2015 .

[7]  Manuel Corpas,et al.  DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. , 2009, American journal of human genetics.

[8]  Anupama E. Gururaj,et al.  Finding useful data across multiple biomedical data repositories using DataMed , 2017, Nature Genetics.

[9]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[10]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[11]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[12]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[13]  Iain Hrynaszkiewicz,et al.  Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations , 2016, Research Integrity and Peer Review.

[14]  Helen Parkinson,et al.  ArrayExpress service for reviewers/editors of DNA microarray papers , 2006, Nature Biotechnology.

[15]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[16]  Adrian Alexa,et al.  DNAdigest and Repositive: Connecting the World of Genomic Data , 2016, PLoS biology.

[17]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[18]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[19]  Tempest A. van Schaik,et al.  The need to redefine genomic data sharing: A focus on data accessibility , 2014, Applied & translational genomics.

[20]  Jordi Rambla De Argila,et al.  Consent Codes: Upholding Standard Data Use Conditions , 2016, PLoS genetics.

[21]  Alvis Brazma,et al.  Minimum Information About a Microarray Experiment (MIAME) – Successes, Failures, Challenges , 2009, TheScientificWorldJournal.

[22]  D Carr Developing and Implementing the Wellcome Trust’s Data Management and Sharing Policy , 2017 .

[23]  Robert Petryszak,et al.  Discovering and linking public omics data sets using the Omics Discovery Index , 2017, Nature Biotechnology.

[24]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[25]  Arcadi Navarro,et al.  The European Genome-phenome Archive of human data consented for biomedical research , 2015, Nature Genetics.

[26]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[27]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.