Case Studies for Overcoming Challenges in Using Big Data in Cancer

Abstract The analysis of big healthcare data has enormous potential as a tool for advancing oncology drug development and patient treatment, particularly in the context of precision medicine. However, there are challenges in organizing, sharing, integrating, and making these data readily accessible to the research community. This review presents five case studies illustrating various successful approaches to addressing such challenges. These efforts are CancerLinQ, the American Association for Cancer Research Project GENIE, Project Data Sphere, the National Cancer Institute Genomic Data Commons, and the Veterans Health Administration Clinical Data Initiative. Critical factors in the development of these systems include attention to the use of robust pipelines for data aggregation, common data models, data deidentification to enable multiple uses, integration of data collection into physician workflows, terminology standardization and attention to interoperability, extensive quality assurance and quality control activity, incorporation of multiple data types, and understanding how data resources can be best applied. By describing some of the emerging resources, we hope to inspire consideration of the secondary use of such data at the earliest possible step to ensure the proper sharing of data in order to generate insights that advance the understanding and the treatment of cancer.

[1]  R. Grossman,et al.  Challenges to Using Big Data in Cancer , 2023, Cancer research.

[2]  Allison P. Heath,et al.  The NCI Genomic Data Commons , 2021, Nature Genetics.

[3]  Robert L. Grossman,et al.  Uniform genomic data analysis in the NCI Genomic Data Commons , 2019, Nature Communications.

[4]  J. Wilbanks,et al.  Mechanisms to Govern Responsible Sharing of Open Data: A Progress Report , 2020 .

[5]  W. Rubinstein,et al.  Development of CancerLinQ, a Health Information Learning Platform From Multiple Electronic Health Record Systems to Support Improved Quality of Care. , 2020, JCO clinical cancer informatics.

[6]  Benjamin E. Gross,et al.  Characteristics and outcome of AKT1 E17K-mutant breast cancer defined through AACR GENIE, a clinicogenomic registry. , 2020, Cancer discovery.

[7]  Danne C. Elbers,et al.  The Veterans Precision Oncology Data Commons: Transforming VA data into a national resource for research in precision oncology. , 2019, Seminars in oncology.

[8]  Robert L. Grossman,et al.  Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data , 2018, Trends in genetics : TIG.

[9]  B. Knoppers,et al.  The Genomic Commons. , 2018, Annual review of genomics and human genetics.

[10]  Michael Fitzsimons,et al.  Developing Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API. , 2017, Cancer research.

[11]  Stephen R. Piccolo,et al.  A DREAM Challenge to Build Prediction Models for Short-Term Discontinuation of Docetaxel in Metastatic Castration-Resistant Prostate Cancer , 2017, JCO clinical cancer informatics.

[12]  L. Staudt,et al.  The NCI Genomic Data Commons as an engine for precision medicine. , 2017, Blood.

[13]  H. Rodriguez,et al.  Collaboration to Accelerate Proteogenomics Cancer Care: The Department of Veterans Affairs, Department of Defense, and the National Cancer Institute's Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) Network , 2017, Clinical pharmacology and therapeutics.

[14]  Sean Khozin,et al.  Advantages of a Truly Open-Access Data-Sharing Model. , 2017, The New England journal of medicine.

[15]  Daniel J Sargent,et al.  Estimation of tumour regression and growth rates during treatment in patients with advanced prostate cancer: a retrospective analysis. , 2017, The Lancet. Oncology.

[16]  Thomas Yu,et al.  Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data. , 2017, The Lancet. Oncology.

[17]  AACR Project GENIE: Powering Precision Medicine through an International Consortium. , 2017, Cancer discovery.

[18]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[19]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[20]  Mary Brophy,et al.  Million Veteran Program: A mega-biobank to study genetic influences on health and disease. , 2016, Journal of clinical epidemiology.

[21]  Robert W. Corty,et al.  The project data sphere initiative: accelerating cancer research by sharing data. , 2015, The oncologist.

[22]  Richard L Schilsky,et al.  Building a rapid learning health care system for oncology: the regulatory framework of CancerLinQ. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[23]  J. Rumsfeld,et al.  Insights from advanced analytics at the Veterans Health Administration. , 2014, Health affairs.

[24]  Stephen M. Moore,et al.  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository , 2013, Journal of Digital Imaging.