Variational Bayesian Matching

Matching of samples refers to the problem of inferring unknown co-occurrence or alignment between observations in two data sets. Given two sets of equally many samples, the task is to nd for each sample a representative sample in the other set, without prior knowledge on a distance measure between the sets. Recently a few alternative solutions have been suggested, based on maximization of joint likelihood or various measures of between-data statistical dependency. In this work we present an variational Bayesian solution for the problem, learning a Bayesian canonical correlation analysis model with a permutation parameter for re-ordering the samples in one of the sets. We approximate the posterior over the permutations, and demonstrate that the resulting matching algorithm clearly outperforms all of the earlier solutions.

[1]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[2]  Chong Wang,et al.  Variational Bayesian Approach to Canonical Correlation Analysis , 2007, IEEE Transactions on Neural Networks.

[3]  Le Song,et al.  Kernelized Sorting , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Samuel Kaski,et al.  Bayesian CCA via Group Sparsity , 2011, ICML.

[5]  Tony Jebara,et al.  Kernelizing Sorting, Permutation, and Alignment for Minimum Volume PCA , 2004, COLT.

[6]  Tony Jebara,et al.  Multi-object tracking with representations of the symmetric group , 2007, AISTATS.

[7]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[8]  Vince D. Calhoun,et al.  Directional Statistics on Permutations , 2010, AISTATS.

[9]  David M. Blei,et al.  Multilingual Topic Models for Unaligned Text , 2009, UAI.

[10]  Samuel Kaski,et al.  Local dependent components , 2007, ICML '07.

[11]  Slobodan Vucetic,et al.  Convex Kernelized Sorting , 2012, AAAI.

[12]  Matej Oresic,et al.  Metabolic Regulation in Progression to Autoimmune Diabetes , 2011, PLoS Comput. Biol..

[13]  Hal Daumé,et al.  Kernelized Sorting for Natural Language Processing , 2010, AAAI.

[14]  Matej Oresic,et al.  Matching samples of multiple views , 2011, Data Mining and Knowledge Discovery.

[15]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[16]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.