From learning metrics towards dependency exploration

We have recently introduced new kinds of data fusion techniques, where the goal is to find what is shared by data sets, instead of modeling all variation in data. They extend our earlier works on learning of distance metrics, discriminative clustering, and other supervised statistical data mining methods. In the new methods the supervision is symmetric, which translates to mining of dependencies. We have so far introduced methods for associative clustering and for extracting dependent components which generalize classical canonical correlations.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  Samuel Kaski,et al.  Discriminative Clustering: Optimal Contingency Tables by Learning Metrics , 2002, ECML.

[3]  Samuel Kaski,et al.  Associative clustering for exploring dependencies between functional genomics data sets , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Samuel Kaski,et al.  Clustering Based on Conditional Distributions in an Auxiliary Space , 2002, Neural Computation.

[5]  Samuel Kaski,et al.  Sequential information bottleneck for finite data , 2004, ICML.

[6]  Samuel Kaski,et al.  Discriminative components of data , 2005, IEEE Transactions on Neural Networks.

[7]  Suzanna Becker,et al.  Mutual information maximization: models of cortical self-organization. , 1996, Network.

[8]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[9]  Samuel Kaski,et al.  Bankruptcy analysis with self-organizing maps in learning metrics , 2001, IEEE Trans. Neural Networks.

[10]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[11]  Jarkko Venna,et al.  Trustworthiness and metrics in visualizing similarity of gene expression , 2003, BMC Bioinformatics.

[12]  Noam Slonim,et al.  The Information Bottleneck : Theory and Applications , 2006 .

[13]  Naftali Tishby,et al.  Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[14]  Samuel Kaski,et al.  Non-parametric dependent components , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  Samuel Kaski,et al.  Principle of Learning Metrics for Exploratory Data Analysis , 2004, J. VLSI Signal Process..

[16]  Johannes Gehrke,et al.  A framework for measuring changes in data characteristics , 1999, PODS '99.

[17]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.