Classification of Big Data With Application to Imaging Genetics

Big data applications, such as medical imaging and genetics, typically generate datasets that consist of few observations n on many more variables p, a scenario that we denote asp ≫ n. Traditional data processing methods are often insufficient for extracting information out of big data. This calls for the development of new algorithms that can deal with the size, complexity, and the special structure of such datasets. In this paper, we consider the problem of classifying p ≫ n data and propose a classification method based on linear discriminant analysis (LDA). Traditional LDA depends on the covariance estimate of the data, but when p ≫ n, the sample covariance estimate is singular. The proposed method estimates the covariance by using a sparse version of noisy principal component analysis (nPCA). The use of sparsity in this setting aims at automatically selecting variables that are relevant for classification. In experiments, the new method is compared to state-of-the art methods for big data problems using both simulated datasets and imaging genetics datasets.

[1]  Victor Solo,et al.  Vector l0 Sparse Variable PCA , 2011, IEEE Trans. Signal Process..

[2]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[3]  Johannes R. Sveinsson,et al.  Sparse Gaussian noisy independent component analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[5]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[8]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9]  H. Akaike A new look at the statistical model identification , 1974 .

[10]  Ferath Kherif,et al.  Multivariate voxel-based morphometry successfully differentiates schizophrenia patients from healthy controls , 2007, NeuroImage.

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[13]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[16]  Adam J. Schwarz,et al.  CNVs conferring risk of autism or schizophrenia affect cognition in controls , 2013, Nature.

[17]  David P. Wipf,et al.  Understanding and Evaluating Sparse Linear Discriminant Analysis , 2015, AISTATS.

[18]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[19]  Baolin Wu,et al.  Sparse regularized discriminant analysis with application to microarrays , 2012, Comput. Biol. Chem..

[20]  Victor Solo,et al.  Selection of tuning parameters for support vector machines , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[22]  Nick C Fox,et al.  Automatic classification of MR scans in Alzheimer's disease. , 2008, Brain : a journal of neurology.

[23]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[24]  Thomas Kailath,et al.  Detection of signals by information theoretic criteria , 1985, IEEE Trans. Acoust. Speech Signal Process..

[25]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[27]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[28]  Rachel M. Brouwer,et al.  Can structural MRI aid in clinical classification? A machine learning study in two independent samples of patients with schizophrenia, bipolar disorder and healthy subjects , 2014, NeuroImage.

[29]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[30]  Tiejun Tong,et al.  Shrinkage‐based Diagonal Discriminant Analysis and Its Applications in High‐Dimensional Data , 2009, Biometrics.

[31]  Jianqing Fan,et al.  High Dimensional Covariance Matrix Estimation in Approximate Factor Models , 2011, Annals of statistics.

[32]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[33]  Thomas E. Nichols,et al.  Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach , 2010, NeuroImage.

[34]  Victor Solo,et al.  Spatially Sparse, Temporally Smooth MEG Via Vector ℓ0 , 2015, IEEE Trans. Medical Imaging.

[35]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[36]  Ole A. Andreassen,et al.  A mutation in APP protects against Alzheimer’s disease and age-related cognitive decline , 2012, Nature.

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[39]  Victor Solo,et al.  Sparse Variable PCA Using Geodesic Steepest Descent , 2008, IEEE Transactions on Signal Processing.

[40]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[41]  Victor Solo,et al.  On vector L0 penalized multivariate regression , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[43]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[44]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[45]  Xindong Wu,et al.  NESVM: A Fast Gradient Method for Support Vector Machines , 2010, 2010 IEEE International Conference on Data Mining.

[46]  Hilleke E. Hulshoff Pol,et al.  Classification of schizophrenia patients and healthy controls from structural MRI scans in two large independent samples , 2012, NeuroImage.

[47]  Thomas E. Nichols,et al.  Common genetic variants influence human subcortical brain structures , 2015, Nature.

[48]  Victor Solo,et al.  Vector ℓ0 latent-space principal component analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[50]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[51]  Karl J. Friston,et al.  Unified segmentation , 2005, NeuroImage.

[52]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[53]  Bernhard Schölkopf,et al.  GACV for Support Vector Machines , 2000 .

[54]  Bjarni V. Halldórsson,et al.  Many sequence variants affecting diversity of adult human height , 2008, Nature Genetics.

[55]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[56]  Victor Solo,et al.  Spatially Sparse, Temporally Smooth MEG Via Vector $\ell_{0}$ , 2015, IEEE Transactions on Medical Imaging.

[57]  A. Meyer-Lindenberg,et al.  Intermediate phenotypes and genetic mechanisms of psychiatric disorders , 2006, Nature Reviews Neuroscience.

[58]  Tom M. Mitchell,et al.  Machine learning classifiers and fMRI: A tutorial overview , 2009, NeuroImage.

[59]  Jon Atli Benediktsson,et al.  Advances in Spectral-Spatial Classification of Hyperspectral Images , 2013, Proceedings of the IEEE.

[60]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[61]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[62]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[63]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[64]  Yufeng Liu,et al.  Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[65]  Yufeng Liu,et al.  Variable Selection via A Combination of the L0 and L1 Penalties , 2007 .

[66]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[67]  Andrew J. Saykin,et al.  Voxelwise genome-wide association study (vGWAS) , 2010, NeuroImage.

[68]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[69]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[70]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[71]  Marisa O. Hollinshead,et al.  Identification of common variants associated with human hippocampal and intracranial volumes , 2012, Nature Genetics.

[72]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[73]  G. W. Snedecor Statistical Methods , 1964 .

[74]  Vaughan J. Carr,et al.  Multivariate neuroanatomical classification of cognitive subtypes in schizophrenia: A support vector machine learning approach , 2014, NeuroImage: Clinical.

[75]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[76]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[77]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[78]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[79]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[80]  Johannes R. Sveinsson,et al.  Hyperspectral image denoising using a sparse low rank model and dual-tree complex wavelet transform , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.

[81]  Paul M. Thompson,et al.  Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer's disease , 2012, NeuroImage.

[82]  M. Wand,et al.  Multivariate Locally Weighted Least Squares Regression , 1994 .

[83]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[84]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[85]  Hansong Zhang,et al.  Gacv for support vector machines , 2000 .

[86]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.