An Explainable and Statistically Validated Ensemble Clustering Model Applied to the Identification of Traumatic Brain Injury Subgroups

We present a framework for an explainable and statistically validated ensemble clustering model applied to Traumatic Brain Injury (TBI). The objective of our analysis is to identify patient injury severity subgroups and key phenotypes that delineate these subgroups using varied clinical and computed tomography data. Explainable and statistically-validated models are essential because a data-driven identification of subgroups is an inherently multidisciplinary undertaking. In our case, this procedure yielded six distinct patient subgroups with respect to mechanism of injury, severity of presentation, anatomy, psychometric, and functional outcome. This framework for ensemble cluster analysis fully integrates statistical methods at several stages of analysis to enhance the quality and the explainability of results. This methodology is applicable to other clinical data sets that exhibit significant heterogeneity as well as other diverse data science applications in biomedicine and elsewhere.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Adrien Depeursinge,et al.  Revealing Tumor Habitats from Texture Heterogeneity Analysis for Classification of Lung Cancer Malignancy and Aggressiveness , 2019, Scientific Reports.

[3]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[4]  J. Kennedy,et al.  WAIS-III Processing Speed Index Scores After TBI: The Influence of Working Memory, Psychomotor Speed and Perceptual Processing , 2003, The Clinical neuropsychologist.

[5]  Hongyue WANG,et al.  Log-transformation and its implications for data analysis , 2014, Shanghai archives of psychiatry.

[6]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[7]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[8]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[9]  Likang Xu,et al.  Surveillance report of traumatic brain injury-related emergency department visits, hospitalizations, and deaths, United States, 2014 , 2019 .

[10]  Aaron J. Masino,et al.  Unsupervised learning with GLRM feature selection reveals novel traumatic brain injury phenotypes , 2018, ArXiv.

[11]  Gayla R. Olbricht,et al.  Ensemble statistical and subspace clustering model for analysis of autism spectrum disorder phenotypes , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[12]  Tayo Obafemi-Ajayi,et al.  Ensemble validation paradigm for intelligent data analysis in autism spectrum disorders , 2018, 2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[13]  Sang-Ho Lee,et al.  Heterogeneous Clustering Ensemble Method for Combining Different Cluster Results , 2006, BioDM.

[14]  Daniel B. Hier,et al.  Exploratory Analysis of Concussion Recovery Trajectories using Multi-modal Assessments and Serum Biomarkers , 2020, 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC).

[15]  Chris H. Q. Ding,et al.  Hierarchical Ensemble Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Gayla R. Olbricht,et al.  Computational Learning Approaches to Data Analytics in Biomedical Applications , 2019 .

[17]  Yiu-ming Cheung,et al.  Iterative Feature Selection in Gaussian Mixture Clustering with Automatic Model Selection , 2007, 2007 International Joint Conference on Neural Networks.

[18]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[19]  James G. Surles,et al.  Model-Dependent Variance Inflation Factor Cutoff Values , 2002 .

[20]  David H. Laidlaw,et al.  Neuroimaging biomarkers of cognitive decline in healthy older adults via unified learning , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[21]  Derek Greene,et al.  Ensemble clustering in medical diagnostics , 2004, Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems.

[22]  Sejong Oh,et al.  Improved Measures of Redundancy and Relevance for mRMR Feature Selection , 2019, Comput..

[23]  India U.S. Consulate Chennai U.S. Department of Health & Human Services , 2014 .

[24]  C. Coffey,et al.  Modern statistical methods for handling missing repeated measurements in obesity trial data: beyond LOCF , 2003, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[25]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[26]  G. Monette,et al.  Generalized Collinearity Diagnostics , 1992 .

[27]  Ramon Diaz-Arrastia,et al.  Measuring Outcome in Traumatic Brain Injury Treatment Trials: Recommendations From the Traumatic Brain Injury Clinical Trials Network , 2010, The Journal of head trauma rehabilitation.

[28]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[29]  J. Fox,et al.  Applied Regression Analysis and Generalized Linear Models , 2008 .

[30]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[31]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  David A. Belsley A Guide to using the collinearity diagnostics , 1991, Computer Science in Economics and Management.

[33]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[34]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[35]  Lei Xu,et al.  Best first strategy for feature selection , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[36]  Tom Ronan,et al.  OpenEnsembles: A Python Resource for Ensemble Clustering , 2018, J. Mach. Learn. Res..

[37]  J. Giacino,et al.  Recommendations for the use of common outcome measures in traumatic brain injury research. , 2010, Archives of physical medicine and rehabilitation.

[38]  Chang-Dong Wang,et al.  Ensemble clustering using factor graph , 2016, Pattern Recognit..

[39]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[40]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[41]  Ramon Diaz-Arrastia,et al.  Effect of citicoline on functional and cognitive status among patients with traumatic brain injury: Citicoline Brain Injury Treatment Trial (COBRIT). , 2012, JAMA.

[42]  G. Manley,et al.  An update on diagnostic and prognostic biomarkers for traumatic brain injury , 2018, Expert review of molecular diagnostics.

[43]  R. Hahn,et al.  Education Improves Public Health and Promotes Health Equity , 2015, International journal of health services : planning, administration, evaluation.

[44]  Fergal Reid,et al.  Percolation Computation in Complex Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[45]  José A. Moreno-Pérez,et al.  Scatter Search for the Feature Selection Problem , 2003, CAEPIA.