Stable prediction with radiomics data

Motivation: Radiomics refers to the high-throughput mining of quantitative features from radiographic images. It is a promising field in that it may provide a non-invasive solution for screening and classification. Standard machine learning classification and feature selection techniques, however, tend to display inferior performance in terms of (the stability of) predictive performance. This is due to the heavy multicollinearity present in radiomic data. We set out to provide an easy-to-use approach that deals with this problem. Results: We developed a four-step approach that projects the original high-dimensional feature space onto a lower-dimensional latent-feature space, while retaining most of the covariation in the data. It consists of (i) penalized maximum likelihood estimation of a redundancy filtered correlation matrix. The resulting matrix (ii) is the input for a maximum likelihood factor analysis procedure. This two-stage maximum-likelihood approach can be used to (iii) produce a compact set of stable features that (iv) can be directly used in any (regression-based) classifier or predictor. It outperforms other classification (and feature selection) techniques in both external and internal validation settings regarding survival in squamous cell cancers.

[1]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[2]  Paul Horst,et al.  Factor analysis of data matrices , 1965 .

[3]  Hemant Ishwaran,et al.  Evaluating Random Forests for Survival Analysis using Prediction Error Curves. , 2012, Journal of statistical software.

[4]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[5]  W. Velicer,et al.  The Effects of Overextraction on Factor and Component Analysis. , 1992, Multivariate behavioral research.

[6]  Louis Guttman,et al.  “Best possible” systematic estimates of communalities , 1956 .

[7]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[8]  Eric J. W. Visser,et al.  FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0 , 2014, European Journal of Nuclear Medicine and Molecular Imaging.

[9]  Ke-Hai Yuan,et al.  Structural equation modeling with near singular covariance matrices , 2008, Comput. Stat. Data Anal..

[10]  Godfrey H. Thomson,et al.  The Factorial Analysis of Human Ability. , 1940 .

[11]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[12]  S. Mulaik Foundations of Factor Analysis , 1975 .

[13]  Duane T. Wegener,et al.  Evaluating the use of exploratory factor analysis in psychological research. , 1999 .

[14]  K. Jöreskog Some contributions to maximum likelihood factor analysis , 1967 .

[15]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[16]  Tõnu Kollo,et al.  Approximations to the distribution of the sample correlation matrix , 2003 .

[17]  Ronald Boellaard,et al.  Predictive value of quantitative 18F-FDG-PET radiomics analysis in patients with head and neck squamous cell carcinoma , 2020, EJNMMI Research.

[18]  David I. Warton,et al.  Penalized Normal Likelihood and Ridge Regularization of Correlation and Covariance Matrices , 2008 .

[19]  K. Yuan,et al.  On the Likelihood Ratio Test for the Number of Factors in Exploratory Factor Analysis , 2007 .

[20]  H. Kaiser A second generation little jiffy , 1970 .

[21]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[22]  R. Gorsuch,et al.  Effects of under- and overextraction on principal axis factor analysis with varimax rotation. , 1996 .

[23]  Gerhard Mels,et al.  Choosing the Optimal Number of Factors in Exploratory Factor Analysis: A Model Selection Perspective , 2013, Multivariate behavioral research.

[24]  K. Jöreskog,et al.  Applied Factor Analysis in the Natural Sciences. , 1997 .

[25]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[26]  M. S. Bartlett,et al.  The statistical conception of mental factors. , 1937 .

[27]  J. Dziura,et al.  Pathophysiological domains underlying the metabolic syndrome: an alternative factor analytic strategy. , 2014, Annals of epidemiology.

[28]  C.F.W. Peeters Bayesian Exploratory and Confirmatory Factor Analysis: Perspectives on Constrained-Model Selection , 2007 .

[29]  Louis Guttman,et al.  THE DETERMINACY OF FACTOR SCORE MATRICES WITH IMPLICATIONS FOR FIVE OTHER BASIC PROBLEMS OF COMMON‐FACTOR THEORY1 , 1955 .

[30]  Wessel N. van Wieringen,et al.  The Spectral Condition Number Plot for Regularization Parameter Determination , 2016, 1608.04123.

[31]  T. W. Anderson,et al.  Asymptotic Chi-Square Tests for a Large Class of Factor Analysis Models , 1990 .

[32]  Ronald Boellaard,et al.  Repeatability of Radiomic Features in Non-Small-Cell Lung Cancer [18F]FDG-PET/CT Studies: Impact of Reconstruction and Delineation , 2016, Molecular Imaging and Biology.

[33]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[34]  Richard P. Brent,et al.  An Algorithm with Guaranteed Convergence for Finding a Zero of a Function , 1971, Comput. J..

[35]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[36]  G. Schwarz Estimating the Dimension of a Model , 1978 .