Multi-scale Mining of fMRI Data with Hierarchical Structured Sparsity

Inverse inference, or "brain reading", is a recent paradigm for analyzing functional magnetic resonance imaging (fMRI) data, based on pattern recognition tools. By predicting some cognitive variables related to brain activation maps, this approach aims at decoding brain activity. Inverse inference takes into account the multivariate information between voxels and is currently the only way to assess how precisely some cognitive information is encoded by the activity of neural populations within the whole brain. However, it relies on a prediction function that is plagued by the curse of dimensionality, as we have far more features than samples, i.e., more voxels than fMRI volumes. To address this problem, different methods have been proposed. Among them are univariate feature selection, feature agglomeration and regularization techniques. In this paper, we consider a hierarchical structured regularization. Specifically, the penalization we use is constructed from a tree that is obtained by spatially constrained agglomerative clustering. This approach encodes the spatial prior information in the regularization process, which makes the overall prediction procedure more robust to inter-subject variability. We test our algorithm on a real data acquired for studying the mental representation of objects, and we show that the proposed algorithm yields better prediction accuracy than reference methods.

[1]  Babak Hassibi,et al.  On the Reconstruction of Block-Sparse Signals With an Optimal Number of Measurements , 2008, IEEE Transactions on Signal Processing.

[2]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[3]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[4]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[5]  Guillermo Sapiro,et al.  Collaborative sources identification in mixed signals via hierarchical sparse modeling , 2010, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  P. Tseng Applications of splitting algorithm to decomposition in convex programming and variational inequalities , 1991 .

[7]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[8]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[9]  Janaina Mourão Miranda,et al.  Quantitative prediction of subjective pain intensity from whole-brain fMRI data using Gaussian processes , 2010, NeuroImage.

[10]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[11]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[12]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[13]  F. DuarteM.,et al.  Structured Compressed Sensing , 2011 .

[14]  L. Toth,et al.  How accurate is magnetic resonance imaging of brain function? , 2003, Trends in Neurosciences.

[15]  Brian Knutson,et al.  Interpretable Classifiers for fMRI Improve Prediction of Purchases , 2008, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[16]  Bertrand Thirion,et al.  Multiscale Mining of fMRI Data with Hierarchical Structured Sparsity , 2012, SIAM J. Imaging Sci..

[17]  A. Kleinschmidt,et al.  Graded size sensitivity of object-exemplar-evoked activity patterns within human LOC subregions. , 2008, Journal of neurophysiology.

[18]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[19]  Anthony D. Wagner,et al.  Detecting individual memories through the neural decoding of memory states and past experience , 2010, Proceedings of the National Academy of Sciences.

[20]  Yonina C. Eldar,et al.  Structured Compressed Sensing: From Theory to Applications , 2011, IEEE Transactions on Signal Processing.

[21]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[22]  R. Tyrrell Rockafellar,et al.  Convergence Rates in Forward-Backward Splitting , 1997, SIAM J. Optim..

[23]  Jean-Baptiste Poline,et al.  A supervised clustering approach for extracting predictive information from brain activation images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[24]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[25]  B. Martinet,et al.  R'egularisation d''in'equations variationnelles par approximations successives , 1970 .

[26]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[27]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[28]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[29]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[30]  Gaël Varoquaux,et al.  Total Variation Regularization for fMRI-Based Prediction of Behavior , 2011, IEEE Transactions on Medical Imaging.

[31]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[32]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[33]  A. Ravishankar Rao,et al.  Prediction and interpretation of distributed neural activity with sparse models , 2009, NeuroImage.

[34]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[35]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[36]  Volkan Cevher,et al.  Sparse Signal Recovery Using Markov Random Fields , 2008, NIPS.

[37]  Mark W. Schmidt,et al.  Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials , 2010, AISTATS.

[38]  M. Kowalski Sparse regression using mixed norms , 2009 .

[39]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[40]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[41]  Jean-Baptiste Poline,et al.  Inferring behavior from functional brain images , 1998, Nature Neuroscience.

[42]  Nick G. Kingsbury,et al.  Convex approaches to model wavelet sparsity patterns , 2011, 2011 18th IEEE International Conference on Image Processing.

[43]  Stephen C. Strother,et al.  Support vector machines for temporal classification of block design fMRI data , 2005, NeuroImage.

[44]  David D. Cox,et al.  Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex , 2003, NeuroImage.

[45]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[46]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[47]  Jean-Baptiste Poline,et al.  Dealing with the shortcomings of spatial normalization: Multi‐subject parcellation of fMRI datasets , 2006, Human brain mapping.

[48]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[49]  Julien Mairal,et al.  Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[50]  W. Marsden I and J , 2012 .

[51]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[52]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[53]  Kaustubh Supekar,et al.  Sparse logistic regression for whole-brain classification of fMRI data , 2010, NeuroImage.

[54]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[55]  Charles A. Micchelli,et al.  A Family of Penalty Functions for Structured Sparsity , 2010, NIPS.

[56]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[57]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[58]  Rainer Goebel,et al.  Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns , 2008, NeuroImage.

[59]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[60]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[61]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[62]  D. Chklovskii,et al.  Maps in the brain: what can we learn from them? , 2004, Annual review of neuroscience.

[63]  Francis R. Bach,et al.  High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning , 2009, ArXiv.

[64]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[65]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[66]  Gaël Varoquaux,et al.  Mayavi: 3D Visualization of Scientific Data , 2010, Computing in Science & Engineering.

[67]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[68]  Noah A. Smith,et al.  Structured Sparsity in Structured Prediction , 2011, EMNLP.

[69]  Nicholas Ayache,et al.  Improved Detection Sensitivity in Functional MRI Data Using a Brain Parcelling Technique , 2002, MICCAI.

[70]  Fionn Murtagh,et al.  A Survey of Algorithms for Contiguity-Constrained Clustering and Related Problems , 1985, Comput. J..

[71]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[72]  Masa-aki Sato,et al.  Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns , 2008, NeuroImage.

[73]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[74]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[75]  Jieping Ye,et al.  Fast Overlapping Group Lasso , 2010, ArXiv.