Interpreting single trial data using groupwise regularisation

Univariate statistical approaches are often used for the analysis of neuroimaging data but are unable to detect subtle interactions between different components of brain activity. In contrast, multivariate approaches that use classification as a basis are well-suited to detect such interactions, allowing the analysis of neuroimaging data on the single trial level. However, multivariate approaches typically assign a non-zero contribution to every component, making interpretation of the results troublesome. This paper introduces groupwise regularisation as a novel method for finding sparse, and therefore easy to interpret, models that are able to predict the experimental condition to which single trials belong. Furthermore, the obtained models can be constrained in various ways by placing features extracted from the data that are thought to belong together into groups. In order to learn models from data, we introduce a new algorithm that makes use of stability conditions that have been derived in this paper. The algorithm is used to classify multisensor EEG signals recorded for a motor imagery task using (groupwise) regularised logistic regression as the underlying classifier. We show that regularisation dramatically reduces the number of features without reducing the classification rate. This improves model interpretability as it finds features in the data such as mu and beta desynchronisation in the motor cortex contralateral to the imagined movement. By choosing particular groupings we can constrain the regularised solutions such that a lower number of sensors is used or a model is obtained that generalises well over subjects. The identification of a small number of groups of features that best explain the data make groupwise regularisation a useful new tool for single trial analysis.

[1]  R. Hari,et al.  Functional Segregation of Movement-Related Rhythmic Activity in the Human Brain , 1995, NeuroImage.

[2]  Clemens Brunner,et al.  Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks , 2006, NeuroImage.

[3]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[4]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[5]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[6]  Tom M. Mitchell,et al.  Learning to Decode Cognitive States from Brain Images , 2004, Machine Learning.

[7]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[8]  G Pfurtscheller,et al.  Induced Oscillations in the Alpha Band: Functional Meaning , 2003, Epilepsia.

[9]  Rajat Raina,et al.  Constructing informative priors using transfer learning , 2006, ICML.

[10]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[11]  E. Adrian,et al.  THE BERGER RHYTHM: POTENTIAL CHANGES FROM THE OCCIPITAL LOBES IN MAN , 1934 .

[12]  Kaspar Anton Schindler,et al.  Application of a multivariate seizure detection and prediction method to non-invasive and intracranial long-term EEG recordings , 2008, Clinical Neurophysiology.

[13]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[14]  A. Ravishankar Rao,et al.  Prediction and interpretation of distributed neural activity with sparse models , 2009, NeuroImage.

[15]  G. Pfurtscheller,et al.  Motor imagery activates primary sensorimotor area in humans , 1997, Neuroscience Letters.

[16]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  James Joseph Biundo,et al.  Analysis of Contingency Tables , 1969 .

[18]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[19]  F. L. D. Silva,et al.  Event-related dynamics of alpha band rhythms: a neuronal network model of focal ERD-surround ERS , 1999 .

[20]  B. Hjorth An on-line transformation of EEG scalp potentials into orthogonal source derivations. , 1975, Electroencephalography and clinical neurophysiology.

[21]  Andreas Ziehe,et al.  Combining sparsity and rotational invariance in EEG/MEG source reconstruction , 2008, NeuroImage.

[22]  Sean M. Polyn,et al.  Beyond mind-reading: multi-voxel pattern analysis of fMRI data , 2006, Trends in Cognitive Sciences.

[23]  Thomas Hofmann,et al.  Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks , 2007 .

[24]  G. Chatrian,et al.  The blocking of the rolandic wicket rhythm and some central changes related to movement. , 1959, Electroencephalography and clinical neurophysiology.

[25]  Polina Golland,et al.  A distributed spatio-temporal EEG/MEG inverse solver , 2009, NeuroImage.

[26]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[27]  J. W. Kuhlman,et al.  Functional topography of the human mu rhythm. , 1978, Electroencephalography and clinical neurophysiology.

[28]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[29]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[30]  David D. Cox,et al.  Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex , 2003, NeuroImage.

[31]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[32]  G. Pfurtscheller,et al.  Brain-Computer Interfaces for Communication and Control. , 2011, Communications of the ACM.

[33]  F. L. D. Silva,et al.  Event-related EEG/MEG synchronization and desynchronization: basic principles , 1999, Clinical Neurophysiology.

[34]  Kazuyuki Aihara,et al.  Classifying matrices with a spectral regularization , 2007, ICML '07.

[35]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[36]  R. Oostenveld,et al.  Nonparametric statistical testing of EEG- and MEG-data , 2007, Journal of Neuroscience Methods.

[37]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[38]  Mark Hallett,et al.  Exploration of computational methods for classification of movement intention during human voluntary movement from single trial EEG , 2007, Clinical Neurophysiology.

[39]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[40]  S. Sathiya Keerthi,et al.  A Fast Dual Algorithm for Kernel Logistic Regression , 2002, 2007 International Joint Conference on Neural Networks.

[41]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[42]  G. Pfurtscheller,et al.  Event-related cortical desynchronization detected by power measurements of scalp EEG. , 1977, Electroencephalography and clinical neurophysiology.

[43]  William H. Press,et al.  Numerical recipes in C , 2002 .

[44]  K. Jellinger Toward Brain-Computer Interfacing , 2009 .

[45]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[46]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..