Local dependent components

We introduce a mixture of probabilistic canonical correlation analyzers model for analyzing local correlations, or more generally mutual statistical dependencies, in cooccurring data pairs. The model extends the traditional canonical correlation analysis and its probabilistic interpretation in three main ways. First, a full Bayesian treatment enables analysis of small samples (large p, small n, a crucial problem in bioinformatics, for instance), and rigorous estimation of the degree of dependency and independency. Secondly, the mixture formulation generalizes the method from global linearity to the more reasonable assumption of different kinds of dependencies for different kinds of data. As a third novel extension the method decomposes the variation in the data into shared and data set-specific components.

[1]  John Shawe-Taylor,et al.  Using KCCA for Japanese–English cross-language information retrieval and document classification , 2006, Journal of Intelligent Information Systems.

[2]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[3]  Radford M. Neal,et al.  Splitting and merging components of a nonconjugate Dirichlet process mixture model , 2007 .

[4]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[5]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[6]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[7]  Colin Fyfe,et al.  Stochastic Processes for Canonical Correlation Analysis , 2006, ESANN.

[8]  Michel Verleysen,et al.  Robust probabilistic projections , 2006, ICML.

[9]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[10]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[11]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[12]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[13]  Samuel Kaski,et al.  Associative Clustering for Exploring Dependencies between Functional Genomics Data Sets , 2005, TCBB.

[14]  John A. Berger,et al.  Jointly analyzing gene expression and copy number data in breast cancer using data reduction models , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[16]  Colin Fyfe,et al.  A Gaussian process latent variable model formulation of canonical correlation analysis , 2006, ESANN.

[17]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[18]  Nikos A. Vlassis,et al.  Non-linear CCA and PCA by Alignment of Local Models , 2003, NIPS.

[19]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[20]  Carla E. Brodley,et al.  Correlation Clustering for Learning Mixtures of Canonical Correlation Models , 2005, SDM.

[21]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[22]  S. Kaski,et al.  Generative Models that Discover Dependencies Between Data Sets , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[23]  Naftali Tishby,et al.  Multivariate Information Bottleneck , 2001, Neural Computation.