Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives

Abstract Compelling research has recently shown that cancer is so heterogeneous that single research centres cannot produce enough data to fit prognostic and predictive models of sufficient accuracy. Data sharing in precision oncology is therefore of utmost importance. The Findable, Accessible, Interoperable and Reusable (FAIR) Data Principles have been developed to define good practices in data sharing. Motivated by the ambition of applying the FAIR Data Principles to our own clinical precision oncology implementations and research, we have performed a systematic literature review of potentially relevant initiatives. For clinical data, we suggest using the Genomic Data Commons model as a reference as it provides a field-tested and well-documented solution. Regarding classification of diagnosis, morphology and topography and drugs, we chose to follow the World Health Organization standards, i.e. ICD10, ICD-O-3 and Anatomical Therapeutic Chemical classifications, respectively. For the bioinformatics pipeline, the Genome Analysis ToolKit Best Practices using Docker containers offer a coherent solution and have therefore been selected. Regarding the naming of variants, we follow the Human Genome Variation Society's standard. For the IT infrastructure, we have built a centralized solution to participate in data sharing through federated solutions such as the Beacon Networks.

[1]  L. Staudt,et al.  The NCI Genomic Data Commons as an engine for precision medicine. , 2017, Blood.

[2]  AACR Project GENIE: Powering Precision Medicine through an International Consortium. , 2017, Cancer discovery.

[3]  E. Roughead,et al.  The validity of the Rx-Risk Comorbidity Index using medicines mapped to the Anatomical Therapeutic Chemical (ATC) Classification System , 2018, BMJ Open.

[4]  Heather Mason-Suares,et al.  The current state of clinical interpretation of sequence variants. , 2017, Current opinion in genetics & development.

[5]  Marc S. Williams,et al.  ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing , 2013, Genetics in Medicine.

[6]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[7]  Michelle Whirl-Carrillo,et al.  Standardizing terms for clinical pharmacogenetic test results: consensus terms from the Clinical Pharmacogenetics Implementation Consortium (CPIC) , 2016, Genetics in Medicine.

[8]  Matthew S Lebo,et al.  Canadian Open Genetics Repository (COGR): a unified clinical genomics database as a community resource for standardising and sharing genetic interpretations , 2015, Journal of Medical Genetics.

[9]  Mark Lawler,et al.  From Rosalind Franklin to Barack Obama: Data Sharing Challenges and Solutions in Genomics and Personalised Medicine , 2017, The New bioethics : a multidisciplinary journal of biotechnology and the body.

[10]  Andreas Wahle,et al.  DICOM for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured PET/CT analysis results in head and neck cancer research , 2016, PeerJ.

[11]  L. F. A. Wessels,et al.  Towards a global cancer knowledge network: dissecting the current international cancer genomic sequencing landscape , 2017, Annals of oncology : official journal of the European Society for Medical Oncology.

[12]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[13]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[14]  Hilla Peretz,et al.  The , 1966 .

[15]  Dietrich Rebholz-Schuhmann,et al.  Improving data workflow systems with cloud services and use of open data for bioinformatics research , 2017, Briefings Bioinform..

[16]  Rachel G Liao,et al.  Facilitating a culture of responsible and effective sharing of cancer genome data , 2016, Nature Medicine.

[17]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[18]  Soma Das,et al.  Clinical Laboratories Collaborate to Resolve Differences in Variant Interpretations Submitted to ClinVar , 2017, Genetics in Medicine.

[19]  Trevor J Pugh,et al.  Data resources for the identification and interpretation of actionable mutations by clinicians , 2017, Annals of oncology : official journal of the European Society for Medical Oncology.

[20]  Subha Madhavan,et al.  ClinGen Cancer Somatic Working Group - Standardizing and democratizing access to cancer molecular diagnostic data to drive translational research , 2018, PSB.

[21]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[22]  Ka Yee Yeung,et al.  Reproducible Bioconductor Workflows Using Browser-based Interactive Notebooks and Containers , 2017, bioRxiv.

[23]  M. Billaud [Intratumor heterogeneity, a Darwinian stumbling block towards personalized medicine?]. , 2012, Medecine sciences : M/S.

[24]  C. Bustamante,et al.  Privacy Risks from Genomic Data-Sharing Beacons , 2015, American journal of human genetics.

[25]  Arturo Molina,et al.  Abiraterone and increased survival in metastatic prostate cancer. , 2011, The New England journal of medicine.

[26]  Anne Randorff Højen,et al.  SNOMED CT adoption in Denmark - why is it so hard? , 2014, MIE.

[27]  Yann Joly,et al.  Integrating precision cancer medicine into healthcare—policy, practice, and research challenges , 2016, Genome Medicine.

[28]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[29]  Xintao Wu,et al.  An overview of human genetic privacy , 2017, Annals of the New York Academy of Sciences.

[30]  Terry L. Smith,et al.  Is breast cancer survival improving? , 2004, Cancer.

[31]  D. Karolchik,et al.  The UCSC Genome Browser database: 2016 update , 2015, bioRxiv.

[32]  I. Tannock,et al.  Limits to Personalized Cancer Medicine. , 2016, The New England journal of medicine.

[33]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[34]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[35]  P. Lambin,et al.  Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital - A real life proof of concept. , 2016, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[36]  Philipp Neuhaus,et al.  Portal of medical data models: information infrastructure for medical research and healthcare , 2016, Database J. Biol. Databases Curation.

[37]  Raymond Dalgleish,et al.  HGVS Recommendations for the Description of Sequence Variants: 2016 Update , 2016, Human mutation.

[38]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[39]  B. Knoppers,et al.  Are Data Sharing and Privacy Protection Mutually Exclusive? , 2016, Cell.

[40]  Marcin Imielinski,et al.  The cancer precision medicine knowledge base for structured clinical-grade mutations and interpretations , 2016, J. Am. Medical Informatics Assoc..

[41]  R H Dolin,et al.  Health Level Seven Interoperability Strategy: Big Data, Incrementally Structured , 2014, Methods of Information in Medicine.

[42]  Simon M. Lin,et al.  A Review on Genomics APIs , 2015, Computational and structural biotechnology journal.

[43]  Heather Lea Moulaison,et al.  Electronic Health Records Data and Metadata: Challenges for Big Data in the United States , 2013, Big Data.

[44]  P. Harris,et al.  Research electronic data capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support , 2009, J. Biomed. Informatics.

[45]  J. Armitage Early-stage Hodgkin's lymphoma. , 2010, The New England journal of medicine.

[46]  Michael Y. Galperin,et al.  The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection , 2015, Nucleic Acids Res..

[47]  Steven J. M. Jones,et al.  CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer , 2017, Nature Genetics.

[48]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[49]  Gavin R. Oliver,et al.  Experience with precision genomics and tumor board, indicates frequent target identification, but barriers to delivery , 2017, Oncotarget.

[50]  J. Brenton,et al.  Unravelling tumour heterogeneity using next-generation imaging: radiomics, radiogenomics, and habitat imaging. , 2017, Clinical radiology.

[51]  Gilberto Fragoso,et al.  caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability , 2008, J. Biomed. Informatics.

[52]  Nita A. Farahany,et al.  Redefining Genomic Privacy: Trust and Empowerment , 2014, bioRxiv.

[53]  Gil Alterovitz,et al.  SMART on FHIR Genomics: facilitating standardized clinico-genomic apps , 2015, J. Am. Medical Informatics Assoc..

[54]  Neil Savage Getting Data Sharing Right to Help Fulfill the Promise of Cancer Genomics , 2017, Cell.

[55]  Mingming Jia,et al.  COSMIC: somatic cancer genetics at high-resolution , 2016, Nucleic Acids Res..

[56]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[57]  Bonnie Kaplan,et al.  Selling Health Data , 2015, Cambridge Quarterly of Healthcare Ethics.

[58]  Kamran Sartipi,et al.  HL7 FHIR: An Agile and RESTful approach to healthcare information exchange , 2013, Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems.

[59]  B. Barlogie,et al.  Curing myeloma at last: defining criteria and providing the evidence. , 2014, Blood.

[60]  Heidi L. Rehm,et al.  Building the foundation for genomics in precision medicine , 2015, Nature.

[61]  Lennart Martens,et al.  Toward more transparent and reproducible omics studies through a common metadata checklist and data publications. , 2014, Omics : a journal of integrative biology.

[62]  鄒佩玲,et al.  美國癌症基因體圖譜計畫TCGA(The Cancer Genome Atlas)簡介 , 2013 .

[63]  Elizabeth M. Smigielski,et al.  dbSNP: a database of single nucleotide polymorphisms , 2000, Nucleic Acids Res..

[64]  Peter N Robinson,et al.  Genomic data sharing for translational research and diagnostics , 2014, Genome Medicine.

[65]  Bartha M Knoppers,et al.  Registered access: a ‘Triple-A' approach , 2016, European Journal of Human Genetics.

[66]  Jyotishman Pathak,et al.  Minimum information required for a DMET experiment reporting. , 2016, Pharmacogenomics.

[67]  Alvis Brazma,et al.  Minimum Information About a Microarray Experiment (MIAME) – Successes, Failures, Challenges , 2009, TheScientificWorldJournal.

[68]  Sean Khozin,et al.  Advantages of a Truly Open-Access Data-Sharing Model. , 2017, The New England journal of medicine.

[69]  Joshua F. McMichael,et al.  DoCM: a database of curated mutations in cancer , 2016, Nature Methods.

[70]  Mark I McCarthy,et al.  Data sharing in large research consortia: experiences and recommendations from ENGAGE , 2013, European Journal of Human Genetics.

[71]  Robert Cook-Deegan,et al.  Moving beyond Bermuda: sharing data to build a medical information commons , 2017, Genome research.

[72]  Lior Pachter,et al.  The NIH BD2K center for big data in translational genomics , 2015, J. Am. Medical Informatics Assoc..

[73]  Jane Kaye,et al.  Data sharing policy design for consortia: challenges for sustainability , 2014, Genome Medicine.

[74]  Marco Masseroli,et al.  Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. , 2016, Methods.

[75]  Clement J. McDonald,et al.  Development of the Logical Observation Identifier Names and Codes (LOINC) vocabulary. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[76]  P. Trott,et al.  International Classification of Diseases for Oncology , 1977 .

[77]  Moriah H Nissan,et al.  OncoKB: A Precision Oncology Knowledge Base. , 2017, JCO precision oncology.