Correlative hierarchical clustering-based low-rank dimensionality reduction of radiomics-driven phenotype in non-small cell lung cancer

Background: Lung cancer is one of the most common cancers in the United States and the most fatal, with 142,670 deaths in 2019. Accurately determining tumor response is critical to clinical treatment decisions, ultimately impacting patient survival. To better differentiate between non-small cell lung cancer (NSCLC) responders and non-responders to therapy, radiomic analysis is emerging as a promising approach to identify associated imaging features undetectable by the human eye. However, the plethora of variables extracted from an image may actually undermine the performance of computer-aided prognostic assessment, known as the curse of dimensionality. In the present study, we show that correlative-driven hierarchical clustering improves high-dimensional radiomics-based feature selection and dimensionality reduction, ultimately predicting overall survival in NSCLC patients. Methods: To select features for high-dimensional radiomics data, a correlation-incorporated hierarchical clustering algorithm automatically categorizes features into several groups. The truncation distance in the resulting dendrogram graph is used to control the categorization of the features, initiating low-rank dimensionality reduction in each cluster, and providing descriptive features for Cox proportional hazards (CPH)-based survival analysis. Using a publicly available non- NSCLC radiogenomic dataset of 204 patients’ CT images, 429 established radiomics features were extracted. Low-rank dimensionality reduction via principal component analysis (PCA) was employed (𝒌 = 𝟏, 𝒏 < 𝟏) to find the representative components of each cluster of features and calculate cluster robustness using the relative weighted consistency metric. Results: Hierarchical clustering categorized radiomic features into several groups without primary initialization of cluster numbers using the correlation distance metric (as a function) to truncate the resulting dendrogram into different distances. The dimensionality was reduced from 429 to 67 features (for truncation distance of 0.1). The robustness within the features in clusters was varied from -1.12 to -30.02 for truncation distances of 0.1 to 1.8, respectively, which indicated that the robustness decreases with increasing truncation distance when smaller number of feature classes (i.e., clusters) are selected. The best multivariate CPH survival model had a C-statistic of 0.71 for truncation distance of 0.1, outperforming conventional PCA approaches by 0.04, even when the same number of principal components was considered for feature dimensionality. Conclusions: Correlative hierarchical clustering algorithm truncation distance is directly associated with robustness of the clusters of features selected and can effectively reduce feature dimensionality while improving outcome prediction.

[1]  Olivier Gevaert,et al.  Non-small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data--methods and preliminary results. , 2012, Radiology.

[2]  George Lee,et al.  Evaluating feature selection strategies for high dimensional, small sample size datasets , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[3]  A. Jemal,et al.  Cancer statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[4]  Stephen M. Moore,et al.  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository , 2013, Journal of Digital Imaging.

[5]  Bardia Yousefi,et al.  Development of computer-aided detection of breast lesion using gabor-wavelet BASED features in mammographic images , 2013, 2013 IEEE International Conference on Control System, Computing and Engineering.

[6]  Anant Madabhushi,et al.  Emerging Themes in Image Informatics and Molecular Analysis for Digital Pathology. , 2016, Annual review of biomedical engineering.

[7]  Prateek Prasanna,et al.  Radiomics and radiogenomics in lung cancer: A review for the clinician. , 2018, Lung cancer.

[8]  A. Devaraj,et al.  Radiomics of pulmonary nodules and lung cancer. , 2017, Translational lung cancer research.

[9]  H. Aerts Semantics Features : Phenotype Quantification by a Radiologist ’ s Expert Eye , 2016 .

[10]  Andre Dekker,et al.  Radiomics: the process and the challenges. , 2012, Magnetic resonance imaging.

[11]  Jana Novovicová,et al.  Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Michael R Hamblin,et al.  CA : A Cancer Journal for Clinicians , 2011 .

[13]  Prateek Prasanna,et al.  Co-occurrence of Local Anisotropic Gradient Orientations (CoLlAGe): A new radiomics descriptor , 2016, Scientific Reports.

[14]  Andriy Fedorov,et al.  Computational Radiomics System to Decode the Radiographic Phenotype. , 2017, Cancer research.