Variational Inference for Watson Mixture Model

This paper addresses modelling data using the Watson distribution. The Watson distribution is one of the simplest distributions for analyzing axially symmetric data. This distribution has gained some attention in recent years due to its modeling capability. However, its Bayesian inference is fairly understudied due to difficulty in handling the normalization factor. Recent development of Markov chain Monte Carlo (MCMC) sampling methods can be applied for this purpose. However, these methods can be prohibitively slow for practical applications. A deterministic alternative is provided by variational methods that convert inference problems into optimization problems. In this paper, we present a variational inference for Watson mixture models. First, the variational framework is used to side-step the intractability arising from the coupling of latent states and parameters. Second, the variational free energy is further lower bounded in order to avoid intractable moment computation. The proposed approach provides a lower bound on the log marginal likelihood and retains distributional information over all parameters. Moreover, we show that it can regulate its own complexity by pruning unnecessary mixture components while avoiding over-fitting. We discuss potential applications of the modeling with Watson distributions in the problem of blind source separation, and clustering gene expression data sets.

[1]  I. Dryden Statistical analysis on high-dimensional spheres and shape spaces , 2005, math/0508279.

[2]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[3]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[5]  Stéphanie Bidon,et al.  Robust adaptive beamforming using a Bayesian steering vector error model , 2013, Signal Process..

[6]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[7]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[8]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[9]  G. S. Watson,et al.  Equatorial distributions on a sphere , 1965 .

[10]  Joydeep Ghosh,et al.  Frequency sensitive competitive learning for clustering on high-dimensional hyperspheres , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[11]  Peter D. Hoff,et al.  Simulation of the Matrix Bingham–von Mises–Fisher Distribution, With Applications to Multivariate and Relational Data , 2007, 0712.4166.

[12]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[13]  Inderjit S. Dhillon,et al.  Diametrical clustering for identifying anti-correlated gene clusters , 2003, Bioinform..

[14]  Jalil Taghia,et al.  A variational Bayes approach to the underdetermined blind source separation with automatic determination of the number of sources , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[16]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[17]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[18]  M. J. Prentice A distribution-free method of interval estimation for unsigned directional data , 1984 .

[19]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[21]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[22]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[23]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[24]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Ruslan Salakhutdinov,et al.  Adaptive Overrelaxed Bound Optimization Methods , 2003, ICML.

[26]  Hiroshi Sawada,et al.  Measuring Dependence of Bin-wise Separated Signals for Permutation Alignment in Frequency-domain BSS , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[27]  Ronald F. Boisvert,et al.  NIST Handbook of Mathematical Functions , 2010 .

[28]  Hagit Shatkay,et al.  Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis , 2000, ISMB.

[29]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[30]  Jascha Sohl-Dickstein,et al.  Hamiltonian Annealed Importance Sampling for partition function estimation , 2012, ArXiv.

[31]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[32]  M. Sahani,et al.  Counterexamples to variational free energy compactness folk theorems , 2008 .

[33]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[34]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[35]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[36]  K. P. Lennox,et al.  Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics , 2009, Journal of the American Statistical Association.

[37]  Markus Breitenbach,et al.  Mixture of Watson Distributions: A Generative Model for Hyperspherical Embeddings , 2007, AISTATS.

[38]  K. Mardia,et al.  The complex Watson distribution and shape analysis , 1999 .

[39]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[40]  Jalil Taghia,et al.  Separation of Unknown Number of Sources , 2014, IEEE Signal Processing Letters.

[41]  K. Mardia,et al.  Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data , 2007, Biometrics.

[42]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[43]  Reinhold Häb-Umbach,et al.  Blind speech separation employing directional statistics in an Expectation Maximization framework , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[45]  Suvrit Sra,et al.  The multivariate Watson distribution: Maximum-likelihood estimation and other aspects , 2011, J. Multivar. Anal..

[46]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[47]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[48]  Jeffrey S. Rosenthal,et al.  Optimal Proposal Distributions and Adaptive MCMC , 2011 .

[49]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[50]  Barak A. Pearlmutter,et al.  The LOST Algorithm: Finding Lines and Separating Speech Mixtures , 2008, EURASIP J. Adv. Signal Process..

[51]  Hiroshi Sawada,et al.  Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors , 2007, Signal Process..

[52]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[53]  D. Dunson,et al.  Nonparametric Bayesian density estimation on manifolds with applications to planar shapes. , 2010, Biometrika.

[54]  Reinhold Haeb-Umbach,et al.  An EM Approach to Integrated Multichannel Speech Separation and Noise Suppression , 2010 .

[55]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[56]  Jean-Yves Tourneret,et al.  CS Decomposition Based Bayesian Subspace Estimation , 2012, IEEE Transactions on Signal Processing.

[57]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[58]  Jalil Taghia,et al.  Bayesian Estimation of the von-Mises Fisher Mixture Model with Variational Inference , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[60]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.