FeaFiner: biomarker identification from medical data through feature generalization and selection

Traditionally, feature construction and feature selection are two important but separate processes in data mining. However, many real world applications require an integrated approach for creating, refining and selecting features. To address this problem, we propose FeaFiner (short for Feature Refiner), an efficient formulation that simultaneously generalizes low-level features into higher level concepts and then selects relevant concepts based on the target variable. Specifically, we formulate a double sparsity optimization problem that identifies groups in the low-level features, generalizes higher level features using the groups and performs feature selection. Since in many clinical researches non- overlapping groups are preferred for better interpretability, we further improve the formulation to generalize features using mutually exclusive feature groups. The proposed formulation is challenging to solve due to the orthogonality constraints, non-convexity objective and non-smoothness penal- ties. We apply a recently developed augmented Lagrangian method to solve this formulation in which each subproblem is solved by a non-monotone spectral projected gradient method. Our numerical experiments show that this approach is computationally efficient and also capable of producing solutions of high quality. We also present a generalization bound showing the consistency and the asymptotic behavior of the learning process of our proposed formulation. Finally, the proposed FeaFiner method is validated on Alzheimer's Disease Neuroimaging Initiative dataset, where low-level biomarkers are automatically generalized into robust higher level concepts which are then selected for predicting the disease status measured by Mini Mental State Examination and Alzheimer's Disease Assessment Scale cognitive subscore. Compared to existing predictive modeling methods, FeaFiner provides intuitive and robust feature concepts and competitive predictive accuracy.

[1]  Paul M. Thompson,et al.  Mapping hippocampal and ventricular change in Alzheimer disease , 2004, NeuroImage.

[2]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[3]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[4]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[5]  Jing Li,et al.  Learning Brain Connectivity of Alzheimer's Disease from Neuroimaging Data , 2009, NIPS.

[6]  Jiayu Zhou,et al.  Modeling disease progression via fused sparse group lasso , 2012, KDD.

[7]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[8]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[9]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[10]  W. Marsden I and J , 2012 .

[11]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[12]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[13]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[14]  Agneta Nordberg,et al.  PET imaging of amyloid in Alzheimer's disease , 2004, The Lancet Neurology.

[15]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  R. Coleman,et al.  Neuroimaging and early diagnosis of Alzheimer disease: a look to the future. , 2003, Radiology.

[18]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[19]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[20]  Jimeng Sun,et al.  Patient Risk Prediction Model via Top-k Stability Selection , 2013, SDM.

[22]  David R. Anderson,et al.  Understanding AIC and BIC in Model Selection , 2004 .

[23]  Paul M. Thompson,et al.  Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data , 2012, NeuroImage.

[24]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[25]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[26]  Jiayu Zhou,et al.  Modeling disease progression via multi-task learning , 2013, NeuroImage.

[27]  M A Pericak-Vance,et al.  Genome-wide association study of Alzheimer's disease , 2012, Translational Psychiatry.

[28]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[29]  H. Robinson Principles and Procedures of Statistics , 1961 .

[30]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[31]  C. Grady,et al.  Intercorrelations of regional cerebral glucose metabolic rates in Alzheimer's disease , 1987, Brain Research.

[32]  Yong Zhang,et al.  An augmented Lagrangian approach for sparse principal component analysis , 2009, Mathematical Programming.

[33]  Daoqiang Zhang,et al.  Predicting Future Clinical Changes of MCI Patients Using Longitudinal and Multimodal Biomarkers , 2012, PloS one.

[34]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[35]  S. Geer,et al.  Correlated variables in regression: Clustering and sparse estimation , 2012, 1209.5908.

[36]  Clifford R. Jack,et al.  Predicting Clinical Scores from Magnetic Resonance Scans in Alzheimer's Disease , 2010, NeuroImage.

[37]  R. Cabeza Hemispheric asymmetry reduction in older adults: the HAROLD model. , 2002, Psychology and aging.

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  J. Moossy,et al.  Bilateral symmetry of morphologic lesions in Alzheimer's disease. , 1988, Archives of neurology.

[40]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[41]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[42]  D. Louis Collins,et al.  Relating one-year cognitive change in mild cognitive impairment to baseline MRI features , 2009, NeuroImage.

[43]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[44]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[45]  Nick C Fox,et al.  The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods , 2008, Journal of magnetic resonance imaging : JMRI.