Generative Models that Discover Dependencies Between Data Sets

We develop models for a kind of data fusion task: Combine multiple data sources under the assumption that data set specific variation is irrelevant and only between-data variation is relevant. We extend a recent generative modeling interpretation of Canonical Correlation Analysis (CCA), a traditional linear method applicable to this task, in a way which allows generalization to other types of models. The generative formulation makes all standard tools of Bayesian inference applicable. We finally introduce new dependency- seeking clustering models that outperform standard generative clustering models in their task.