Agglomerative Multivariate Information Bottleneck

The information bottleneck method is an unsupervised model independent data organization technique. Given a joint distribution P(A, B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B, In a recent paper, we introduced a general principled framework for multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are inter-related. In this paper, we present a new family of simple agglomerative algorithms to construct such systems of inter-related clusters. We analyze the behavior of these algorithms and apply them to several real-life datasets.

[1]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[3]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[6]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[7]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8]  Ran El-Yaniv,et al.  Agnostic Classification of Markovian Sequences , 1997, NIPS.

[9]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[10]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[11]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[15]  Naftali Tishby,et al.  Objective Classification of Galaxy Spectra using the Information Bottleneck Method , 2000, astro-ph/0005306.