FAIR‐compliant clinical, radiomics and DICOM metadata of RIDER, interobserver, Lung1 and head‐Neck1 TCIA collections

Purpose One of the most frequently cited radiomics investigations showed that features automatically extracted from routine clinical images could be used in prognostic modeling. These images have been made publicly accessible via The Cancer Imaging Archive (TCIA). There have been numerous requests for additional explanatory metadata on the following datasets — RIDER, Interobserver, Lung1, and Head–Neck1. To support repeatability, reproducibility, generalizability, and transparency in radiomics research, we publish the subjects’ clinical data, extracted radiomics features, and digital imaging and communications in medicine (DICOM) headers of these four datasets with descriptive metadata, in order to be more compliant with findable, accessible, interoperable, and reusable (FAIR) data management principles. Acquisition and validation methods Overall survival time intervals were updated using a national citizens registry after internal ethics board approval. Spatial offsets of the primary gross tumor volume (GTV) regions of interest (ROIs) associated with the Lung1 CT series were improved on the TCIA. GTV radiomics features were extracted using the open‐source Ontology‐Guided Radiomics Analysis Workflow (O‐RAW). We reshaped the output of O‐RAW to map features and extraction settings to the latest version of Radiomics Ontology, so as to be consistent with the Image Biomarker Standardization Initiative (IBSI). Digital imaging and communications in medicine metadata was extracted using a research version of Semantic DICOM (SOHARD, GmbH, Fuerth; Germany). Subjects’ clinical data were described with metadata using the Radiation Oncology Ontology. All of the above were published in Resource Descriptor Format (RDF), that is, triples. Example SPARQL queries are shared with the reader to use on the online triples archive, which are intended to illustrate how to exploit this data submission. Data format The accumulated RDF data are publicly accessible through a SPARQL endpoint where the triples are archived. The endpoint is remotely queried through a graph database web application at http://sparql.cancerdata.org. SPARQL queries are intrinsically federated, such that we can efficiently cross‐reference clinical, DICOM, and radiomics data within a single query, while being agnostic to the original data format and coding system. The federated queries work in the same way even if the RDF data were partitioned across multiple servers and dispersed physical locations. Potential applications The public availability of these data resources is intended to support radiomics features replication, repeatability, and reproducibility studies by the academic community. The example SPARQL queries may be freely used and modified by readers depending on their research question. Data interoperability and reusability are supported by referencing existing public ontologies. The RDF data are readily findable and accessible through the aforementioned link. Scripts used to create the RDF are made available at a code repository linked to this submission: https://gitlab.com/UM‐CDS/FAIR‐compliant_clinical_radiomics_and_DICOM_metadata.

[1]  Timo M. Deist,et al.  Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT , 2017, Clinical and translational radiation oncology.

[2]  Christopher Rorden,et al.  The first step for neuroimaging data analysis: DICOM to NIfTI conversion , 2016, Journal of Neuroscience Methods.

[3]  Jinzhong Yang,et al.  Preliminary investigation into sources of uncertainty in quantitative imaging features , 2015, Comput. Medical Imaging Graph..

[4]  Philippe Lambin,et al.  Quantitative radiomics studies for tissue characterization: a review of technology and methodological procedures , 2017, The British journal of radiology.

[5]  Stephen M. Moore,et al.  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository , 2013, Journal of Digital Imaging.

[6]  Junyong Ye,et al.  CT-Based Radiomics Signature for the Preoperative Discrimination Between Head and Neck Squamous Cell Carcinoma Grades , 2019, Front. Oncol..

[7]  M. Scott Marshall,et al.  Towards a semantic PACS: Using Semantic Web technology to represent imaging data , 2014, MIE.

[8]  A. Dekker,et al.  External Validation of Radiation-Induced Dyspnea Models on Esophageal Cancer Radiotherapy Patients , 2019, Front. Oncol..

[9]  R. Gillies,et al.  Repeatability and Reproducibility of Radiomic Features: A Systematic Review , 2018, International journal of radiation oncology, biology, physics.

[10]  Andre Dekker,et al.  Distributed radiomics as a signature validation study using the Personal Health Train infrastructure , 2019, Scientific Data.

[11]  Guangming Lu,et al.  Radiomic signature: a non-invasive biomarker for discriminating invasive and non-invasive cases of lung adenocarcinoma , 2019, Cancer management and research.

[12]  Stefan Schulz,et al.  Faculty Opinions recommendation of The FAIR Guiding Principles for scientific data management and stewardship. , 2018, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[13]  Jiazhou Wang,et al.  Test–Retest Data for Radiomics Feature Stability Analysis: Generalizable or Study-Specific? , 2016, Tomography.

[14]  Ho Sung Kim,et al.  Reproducibility and Generalizability in Radiomics Modeling: Possible Strategies in Radiologic and Statistical Perspectives , 2019, Korean journal of radiology.

[15]  Yanqi Huang,et al.  Radiomics Signature: A Potential Biomarker for the Prediction of Disease-Free Survival in Early-Stage (I or II) Non-Small Cell Lung Cancer. , 2016, Radiology.

[16]  L. Fass Imaging and cancer: A review , 2008, Molecular oncology.

[17]  R. Steenbakkers,et al.  The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. , 2020, Radiology.

[18]  Patrick Granton,et al.  Radiomics: extracting more information from medical images using advanced feature analysis. , 2012, European journal of cancer.

[19]  P. Lambin,et al.  Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach , 2014, Nature Communications.

[20]  Geoffrey G. Zhang,et al.  Voxel size and gray level normalization of CT radiomic features in lung cancer , 2018, Scientific Reports.

[21]  Alberto Traverso,et al.  The radiation oncology ontology (ROO): Publishing linked data in radiation oncology using semantic web and ontology techniques , 2018, Medical physics.

[22]  H. Aerts,et al.  Applications and limitations of radiomics , 2016, Physics in medicine and biology.

[23]  Andriy Fedorov,et al.  Computational Radiomics System to Decode the Radiographic Phenotype. , 2017, Cancer research.

[24]  田中 利恵 Radiological Society of North America : RSNA, 北米放射線学会 , 2010 .

[25]  Benjamin Haibe-Kains,et al.  Vulnerabilities of radiomic signature development: The need for safeguards. , 2019, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[26]  J. Bussink,et al.  Learning from scanners: Bias reduction and feature correction in radiomics , 2019, Clinical and translational radiation oncology.

[27]  Laurence Court,et al.  Harmonizing the pixel size in retrospective computed tomography radiomics studies , 2017, PloS one.

[28]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[29]  Leonard Wee,et al.  Technical Note: Ontology‐guided radiomics analysis workflow (O‐RAW) , 2019, Medical physics.