Generative embeddings based on Rician mixtures for kernel-based classification of magnetic resonance images

Classical approaches to classifier learning for structured objects (such as images or sequences) are based on probabilistic generative models. On the other hand, state-of-the-art classifiers for vectorial data are learned discriminatively. In recent years, these two dual paradigms have been combined via the use of generative embeddings (of which the Fisher kernel is arguably the best known example); these embeddings are mappings from the object space into a fixed dimensional score space, induced by a generative model learned from data, on which a (maybe kernel-based) discriminative approach can then be used. This paper proposes a new semi-parametric approach to build generative embeddings for classification of magnetic resonance images (MRI). Based on the fact that MRI data is well described by Rice distributions, we propose to use Rician mixtures as the underlying generative model, based on which several different generative embeddings are built. These embeddings yield vectorial representations on which kernel-based support vector machines (SVM) can be trained for classification. Concerning the choice of kernel, we adopt the recently proposed nonextensive information theoretic kernels. The methodology proposed was tested on a challenging classification task, which consists in classifying MRI images as belonging to schizophrenic or non-schizophrenic human subjects. The classification is based on a set of regions of interest (ROIs) in each image, with the classifiers corresponding to each ROI being combined via AdaBoost. The experimental results show that the proposed methodology outperforms the previous state-of-the-art methods on the same dataset.

[1]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[2]  S. Rice Mathematical analysis of random noise , 1944 .

[3]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Jean-Philippe Vert,et al.  Semigroup Kernels on Finite Sets , 2004, NIPS.

[6]  Eric P. Xing,et al.  Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[7]  C. Watkins Dynamic Alignment Kernels , 1999 .

[8]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[9]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[10]  U. Castellani,et al.  Schizophrenia classification using regions of interest in brain MRI , 2009, IDA 2009.

[11]  Kenji Fukumizu,et al.  Semigroup Kernels on Measures , 2005, J. Mach. Learn. Res..

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[13]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[14]  Snehashis Roy,et al.  A Rician mixture model classification algorithm for magnetic resonance images , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[15]  Volodymyr Melnykov,et al.  Finite mixture models and model-based clustering , 2010 .

[16]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[17]  Thorsten Gerber,et al.  Handbook Of Mathematical Functions , 2016 .

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[20]  Mário A. T. Figueiredo,et al.  Generative Embeddings based on Rician Mixtures - Application to Kernel-based Discriminative Classification of Magnetic Resonance Images , 2012, ICPRAM.

[21]  C. R. Rao,et al.  On the convexity of some divergence measures based on entropy functions , 1982, IEEE Trans. Inf. Theory.

[22]  Ranjan Maitra,et al.  On the Expectation-Maximization algorithm for Rice-Rayleigh mixtures with application to noise parameter estimation in magnitude MR datasets , 2013, Sankhya B.

[23]  P. Deb Finite Mixture Models , 2008 .

[24]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[25]  Hiroki Suyari Generalization of Shannon-Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy , 2004, IEEE Transactions on Information Theory.

[26]  B. P. Lathi,et al.  Modern Digital and Analog Communication Systems , 1983 .

[27]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Tony Jebara,et al.  A Kernel Between Sets of Vectors , 2003, ICML.

[29]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[30]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[31]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[32]  Ranjan Maitra,et al.  Synthetic Magnetic Resonance Imaging Revisited , 2010, IEEE Transactions on Medical Imaging.

[33]  Ranjan Maitra,et al.  Noise Estimation in Magnitude MR Datasets , 2009, IEEE Transactions on Medical Imaging.

[34]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[35]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[36]  J. Alison Noble,et al.  Statistical 3D Vessel Segmentation Using a Rician Distribution , 1999, MICCAI.

[37]  R. Henkelman Measurement of signal intensities in the presence of noise in MR images. , 1985, Medical physics.

[38]  Manuele Bicego,et al.  A Hybrid Generative/Discriminative Method for Classification of Regions of Interest in Schizophrenia Brain MRI , 2009 .

[39]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[41]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[42]  Robert P. W. Duin,et al.  Dissimilarity-Based Detection of Schizophrenia , 2010, 2010 First Workshop on Brain Decoding: Pattern Recognition Challenges in Neuroimaging.

[43]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[44]  H. Gudbjartsson,et al.  The rician distribution of noisy mri data , 1995, Magnetic resonance in medicine.

[45]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .