Principle of Learning Metrics for Exploratory Data Analysis

Visualization and clustering of multivariate data are usually based on mutual distances of samples, measured by heuristic means such as the Euclidean distance of vectors of extracted features. Our recently developed methods remove this arbitrariness by learning to measure important differences. The effect is equivalent to changing the metric of the data space. It is assumed that variation of the data is important only to the extent it causes variation in auxiliary data which is available paired to the primary data. The learning of the metric is supervised by the auxiliary data, whereas the data analysis in the new metric is unsupervised. We review two approaches: a clustering algorithm and another that is based on an explicitly generated metric. Applications have so far been in exploratory analysis of texts, gene function, and bankruptcy. Relationships of the two approaches are derived, which leads to new promising approaches to the clustering problem.

[1]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[2]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[3]  Thomas Hofmann,et al.  Learning from Dyadic Data , 1998, NIPS.

[4]  J.C. Principe,et al.  A methodology for information theoretic feature extraction , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[5]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[6]  Trevor Hastie,et al.  Flexible discriminant and mixture models , 2000 .

[7]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference: Kass/Geometrical , 1997 .

[8]  Suzanna Becker,et al.  Mutual information maximization: models of cortical self-organization. , 1996, Network.

[9]  Samuel Kaski,et al.  Clustering Based on Conditional Distributions in an Auxiliary Space , 2002, Neural Computation.

[10]  K. Kiviluoto,et al.  Exploring Corporate Bankruptcy with Two-Level Self-Organizing Map , 1998 .

[11]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[12]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference , 1997 .

[13]  Samuel Kaski,et al.  Bankruptcy analysis with self-organizing maps in learning metrics , 2001, IEEE Trans. Neural Networks.

[14]  T. Kohonen,et al.  Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum , 2003 .

[15]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[16]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[17]  Samuel Kaski,et al.  Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[18]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[19]  S. Kullback,et al.  Information Theory and Statistics , 1959 .