Variational and scale mixture representations of non -Gaussian densities for estimation in the Bayesian linear model: Sparse coding, independent component analysis, and minimum entropy segmentation

Author(s): Palmer, Jason Allan | Abstract: This thesis considers representations of non-Gaussian probability densities for use in various estimation problems associated with the Bayesian Linear Model. We define a class of densities that we call Strongly Super- Gaussian, and show the relationship of these densities to Gaussian Scale Mixtures, and densities with positive kurtosis. Such densities have been used to model "sparse" random variables, with densities that are sharply peaked with heavy tails. We show that strongly super-Gaussian densities are natural generalizations of Gaussian densities, and permit the derivation of monotonic iterative algorithms for parameter estimation in sparse coding in overcomplete signal dictionaries, blind source separation, independent component analysis, and blind multichannel deconvolution. Mixtures of strongly super- Gaussian densities can be used to model arbitrary densities with greater economy that a Gaussian mixture model. The framework is extended to multivariate dependency models for independent subspace analysis. We apply the methods to the estimation of neural electro- magnetic sources from electro-encephalogram recordings, and to sparse coding of images

[1]  Yuhai Wu,et al.  Statistical Learning Theory , 2021, Technometrics.

[2]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[3]  P. Brockwell,et al.  Time Series: Theory and Methods , 2013 .

[4]  Stephen P. Boyd,et al.  Convex Optimization , 2010, IEEE Transactions on Automatic Control.

[5]  K. Kreutz-Delgado,et al.  Super-Gaussian Mixture Source Model for ICA , 2006, ICA.

[6]  Te-Won Lee,et al.  Independent Vector Analysis: An Extension of ICA to Multivariate Components , 2006, ICA.

[7]  Te-Won Lee,et al.  Multivariate Scale Mixture of Gaussians Modeling , 2006, ICA.

[8]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Te-Won Lee,et al.  Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution , 2004, NIPS.

[11]  Bhaskar D. Rao,et al.  Perspectives on Sparse Bayesian Learning , 2003, NIPS.

[12]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[13]  Bhaskar D. Rao,et al.  Subset selection in noise based on diversity measure minimization , 2003, IEEE Trans. Signal Process..

[14]  Terrence J. Sejnowski,et al.  Variational Learning of Clusters of Undercomplete Nonsymmetric Independent Components , 2003, J. Mach. Learn. Res..

[15]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[16]  Dinh-Tuan Pham,et al.  Mutual information approach to blind separation of stationary sources , 2002, IEEE Trans. Inf. Theory.

[17]  Mark A. Girolami,et al.  A Variational Method for Learning Sparse and Overcomplete Representations , 2001, Neural Computation.

[18]  Bhaskar D. Rao,et al.  Backward sequential elimination for sparse vector subset selection , 2001, Signal Process..

[19]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[20]  Catalin Starica,et al.  Gaussian and Non-Gaussian Linear Time Series and Random Fields , 2001 .

[21]  Jean Pierre Delmas,et al.  Asymptotic eigenvalue distribution of block Toeplitz matrices and application to blind SIMO channel identification , 2001, IEEE Trans. Inf. Theory.

[22]  Mário A. T. Figueiredo Adaptive Sparseness Using Jeffreys Prior , 2001, NIPS.

[23]  Bhaskar D. Rao,et al.  FOCUSS-based dictionary learning algorithms , 2000, SPIE Optics + Photonics.

[24]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[25]  Terrence J. Sejnowski,et al.  ICA Mixture Models for Unsupervised Classification of Non-Gaussian Classes and Automatic Context Switching in Blind Signal Separation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[27]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[28]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[29]  Paolo Tilli,et al.  Asymptotic Spectra of Hermitian Block Toeplitz Matrices and Preconditioning Results , 2000, SIAM J. Matrix Anal. Appl..

[30]  Christian Jutten,et al.  What should we say about the kurtosis? , 1999, IEEE Signal Processing Letters.

[31]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[32]  B. Rao,et al.  Forward sequential algorithms for best basis selection , 1999 .

[33]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[34]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[35]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[36]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[37]  Aapo Hyvärinen,et al.  Independent component analysis in the presence of Gaussian noise by maximizing joint likelihood , 1998, Neurocomputing.

[38]  Hagai Attias,et al.  Blind Source Separation and Deconvolution: The Dynamic Component Analysis Algorithm , 1998, Neural Computation.

[39]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[40]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[41]  Tilmann Gneiting,et al.  Normal scale mixtures and dual probability densities , 1997 .

[42]  Andrzej Cichocki,et al.  Stability Analysis of Learning Algorithms for Blind Source Separation , 1997, Neural Networks.

[43]  S. Amari,et al.  Multichannel blind separation and deconvolution of sources with arbitrary distributions , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[44]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[45]  Jean-François Cardoso,et al.  Equivariant adaptive source separation , 1996, IEEE Trans. Signal Process..

[46]  Dinh-Tuan Pham,et al.  Blind separation of instantaneous mixture of sources via an independent component analysis , 1996, IEEE Trans. Signal Process..

[47]  A. J. Bell,et al.  Blind Separation of Event-Related Brain Responses into Independent Components , 1996 .

[48]  D. Field,et al.  Natural image statistics and efficient coding. , 1996, Network.

[49]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[50]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[51]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[52]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[53]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[54]  Q. Cheng On the Unique Representation of Non-Gaussian Linear Processes , 1992 .

[55]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[56]  Donald Geman,et al.  Constrained Restoration and the Recovery of Discontinuities , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Gabriel Popescu,et al.  The laplace transform , 1991, Heat Transfer 1.

[58]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[59]  M. West On scale mixtures of normal distributions , 1987 .

[60]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[61]  O. Barndorff-Nielsen,et al.  Normal Variance-Mean Mixtures and z Distributions , 1982 .

[62]  J. Fischer An algorithm for discrete linearLp approximation , 1982 .

[63]  A. Benveniste,et al.  Robust identification of a nonminimum phase system: Blind adjustment of a linear equalizer in data communications , 1980 .

[64]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[65]  J. Keilson,et al.  MIXTURES OF DISTRIBUTIONS, MOMENT INEQUALITIES AND MEASURES OF EXPONENTIALITY AND NORMALITY' , 1974 .

[66]  Robert M. Gray,et al.  Information rates of autoregressive processes , 1970, IEEE Trans. Inf. Theory.

[67]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[68]  W. J. Studden,et al.  Tchebycheff Systems: With Applications in Analysis and Statistics. , 1967 .

[69]  Samuel Karlin,et al.  Generalized convex inequalities , 1963 .

[70]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[71]  T. Teichmann,et al.  Harmonic Analysis and the Theory of Probability , 1957, The Mathematical Gazette.

[72]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[73]  Yen-Wei Chen,et al.  Ensemble learning for independent component analysis , 2006, Pattern Recognit..

[74]  Stephen J. Roberts,et al.  Variational Mixture of Bayesian Independent Component Analyzers , 2003, Neural Computation.

[75]  K. Kreutz-Delgado,et al.  A GENERAL FRAMEWORK FOR COMPONENT ESTIMATION , 2003 .

[76]  S. Amari,et al.  Adaptive blind signal and image processing , 2002 .

[77]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[78]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[79]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[80]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[81]  Harri Lappalainen,et al.  Ensemble learning for independent component analysis , 1999 .

[82]  R. Hecht-Nielsen,et al.  Image manifolds , 1998, Electronic Imaging.

[83]  Michael I. Jordan,et al.  Variational methods for inference and estimation in graphical models , 1997 .

[84]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[85]  M. Hasselmo,et al.  Gaussian Processes for Regression , 1995, NIPS.

[86]  Horace Barlow,et al.  What is the computational goal of the neocortex , 1994 .

[87]  Jean-Francois Cardoso,et al.  ITERATIVE TECHNIQUES FOR BLIND SOURCE SEPARATION USING ONLY FOURTH-ORDER CUMULANTS , 1992 .

[88]  Dinh Tuan Pham,et al.  Separation of a mixture of independent sources through a maximum likelihood approach , 1992 .

[89]  Pierre Comon,et al.  SIGNAL PROCESSING Independent component analysis , A new concept ? * , 1992 .

[90]  T. Wiesel,et al.  Functional architecture of macaque monkey visual cortex , 1977 .

[91]  I. N. Sneddon The use of integral transforms , 1972 .

[92]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[93]  H. M. Finucan A Note on Kurtosis , 1964 .

[94]  E. A. Parent MIXTURES OF DISTRIBUTIONS , 1962 .

[95]  A. Offord Introduction to the Theory of Fourier Integrals , 1938, Nature.