From space to biomedicine: Enabling biomarker data science in the cloud.

NASA's Jet Propulsion Laboratory (JPL) is advancing research capabilities for data science with two of the National Cancer Institute's major research programs, the Early Detection Research Network (EDRN) and the Molecular and Cellular Characterization of Screen-Detected Lesions (MCL), by enabling data-driven discovery for cancer biomarker research. The research team pioneered a national data science ecosystem for cancer biomarker research to capture, process, manage, share, and analyze data across multiple research centers. By collaborating on software and data-driven methods developed for space and earth science research, the biomarker research community is heavily leveraging similar capabilities to support the data and computational demands to analyze research data. This includes linking diverse data from clinical phenotypes to imaging to genomics. The data science infrastructure captures and links data from over 1600 annotations of cancer biomarkers to terabytes of analysis results on the cloud in a biomarker data commons known as "LabCAS". As the data increases in size, it is critical that automated approaches be developed to "plug" laboratories and instruments into a data science infrastructure to systematically capture and analyze data directly. This includes the application of artificial intelligence and machine learning to automate annotation and scale science analysis.

[1]  Daniel J. Crichton,et al.  Science Storms the Cloud , 2021, AGU Advances.

[2]  Steven Euijong Whang,et al.  A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective , 2018, IEEE Transactions on Knowledge and Data Engineering.

[3]  A. Mahabal,et al.  Spatiotemporal strategies to identify aggressive biology in precancerous breast biopsies , 2020, Wiley interdisciplinary reviews. Systems biology and medicine.

[4]  C I Amos,et al.  Cancer Biomarkers and Big Data: A Planetary Science Approach. , 2020, Cancer cell.

[5]  P. Wagner,et al.  The Early Detection Research Network: A National Infrastructure to Support the Discovery, Development, and Validation of Cancer Biomarkers , 2020, Cancer Epidemiology, Biomarkers & Prevention.

[6]  Sudhir Srivastava,et al.  OVS+Tumor: a tool for enhanced lung tumor annotation in VR for machine learning training and analysis , 2019, ACM SIGGRAPH 2019 Virtual, Augmented, and Mixed Reality.

[7]  B. Kramer,et al.  Cancer overdiagnosis: a biological challenge and clinical dilemma , 2019, Nature Reviews Cancer.

[8]  Luca Cinquini,et al.  The EDRN knowledge environment: an open source, scalable informatics platform for biological sciences research , 2017, Defense + Security.

[9]  Sudhir Srivastava,et al.  The early detection research network: 10-year outlook. , 2013, Clinical chemistry.

[10]  Heather Kincaid,et al.  Development of common data elements: the experience of and recommendations from the early detection research network , 2003, Int. J. Medical Informatics.