Bi-level multi-source learning for heterogeneous block-wise missing data

Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified "bi-level" learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches.

[1]  Paul M. Thompson,et al.  Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data , 2012, NeuroImage.

[2]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[3]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[4]  A. Dale,et al.  Combining MR Imaging, Positron-Emission Tomography, and CSF Biomarkers in the Diagnosis and Prognosis of Alzheimer Disease , 2010, American Journal of Neuroradiology.

[5]  Shannon L. Risacher,et al.  Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning , 2012, Bioinform..

[6]  Cindee M. Madison,et al.  Comparing predictors of conversion and decline in mild cognitive impairment , 2010, Neurology.

[7]  Berwin A. Turlach,et al.  Simultaneous Variable Selection: Some Further Thoughts , 2006 .

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Xiaoying Wu,et al.  Structural and functional biomarkers of prodromal Alzheimer's disease: A high-dimensional pattern classification study , 2008, NeuroImage.

[10]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[11]  Vikas Singh,et al.  Predictive markers for AD in a multi-modality framework: An analysis of MCI progression in the ADNI population , 2011, NeuroImage.

[12]  Zenglin Xu,et al.  Online Learning for Group Lasso , 2010, ICML.

[13]  Matej Oresic,et al.  Multivariate multi-way analysis of multi-source data , 2010, Bioinform..

[14]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[15]  Zenglin Xu,et al.  Web page classification with heterogeneous data fusion , 2007, WWW '07.

[16]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[17]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[18]  Trevor Darrell,et al.  An efficient projection for l1, ∞ regularization , 2009, ICML '09.

[19]  A. Dale,et al.  Multi-modal imaging predicts memory performance in normal aging and cognitive decline , 2010, Neurobiology of Aging.

[20]  Daoqiang Zhang,et al.  Multimodal classification of Alzheimer's disease and mild cognitive impairment , 2011, NeuroImage.

[21]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[22]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[23]  Jing Li,et al.  Heterogeneous data fusion for alzheimer's disease study , 2008, KDD.

[24]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .

[25]  A. Dale,et al.  CSF Biomarkers in Prediction of Cerebral and Clinical Change in Mild Cognitive Impairment and Alzheimer's Disease , 2010, The Journal of Neuroscience.

[26]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[27]  G. Michailidis,et al.  On multi-view learning with additive models , 2009, 0906.1117.

[28]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[29]  Kathryn Ziegler-Graham,et al.  Forecasting the global burden of Alzheimer’s disease , 2007, Alzheimer's & Dementia.

[30]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[31]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[34]  Tong Zhang,et al.  Two-view feature generation model for semi-supervised learning , 2007, ICML '07.

[35]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Vince D. Calhoun,et al.  A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data , 2009, NeuroImage.

[37]  C. Jack,et al.  MRI and CSF biomarkers in normal, MCI, and AD subjects , 2009, Neurology.

[38]  Nick C Fox,et al.  The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods , 2008, Journal of magnetic resonance imaging : JMRI.

[39]  Daoqiang Zhang,et al.  Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease , 2012, NeuroImage.

[40]  Jieping Ye,et al.  Efficient Sparse Group Feature Selection via Nonconvex Optimization , 2012, ICML.

[41]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[42]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[43]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[44]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[45]  Jieping Ye,et al.  Sparse methods for biomedical data , 2012, SKDD.