Machine Learning on High Dimensional Shape Data from Subcortical Brain Surfaces: A Comparison of Feature Selection and Classification Methods

Recently, high-dimensional shape data HDSD has been demonstrated to be informative in describing subcortical brain morphometry in several disorders. While HDSD may serve as a biomarker of disease, its high dimensionality may require careful treatment in its application to machine learning. Here, we compare several possible approaches for feature selection and pattern classification using HDSD. We explore the efficacy of three candidate feature selection FS methods: Guided Random Forest GRF, LASSO and no feature selection NFS. Each feature set was applied to three classifiers: Random Forest RF, Support Vector Machines SVM and Naive Bayes NB. Each model was cross-validated using two diagnostic contrasts: Alzheimer's Disease and mild cognitive impairment; each relative to matched controls. GRF and NFS outperformed LASSO as FS methods and were comparably competitive. NB underperformed relative to RF and SVM, which were comparable in performance. Our results advocate the NFS-RF approach for its speed, simplicity and interpretability.