论文信息 - Associative Clustering by Maximizing a Bayes Factor

Associative Clustering by Maximizing a Bayes Factor

Abstract Clustering by maximizing the dependency between (margin) group-ings or partitionings of co-occurring data pairs is studied. We sug-gest a probabilistic criterion that generalizes discriminative cluster-ing (DC), an extension of the information bottleneck (IB) principleto labeled continuous data. The criterion is the Bayes factor be-tween models assuming dependence and independence of the twocluster sets, and it can be used as a well-founded criterion for IB forsmall data sets. With suitable prior assumptions the Bayes factoris equivalent to the hypergeometric probability of a contingency ta-ble with the optimized clusters at the margins, and for large datait becomes the standard mutual information. An algorithm fortwo-margin clustering of paired continuous data, associative clus-tering (AC), is introduced. Genes are clustered to ﬁnd dependen-cies between gene expression and transcription factor binding, anddependencies between expression in diﬀerent organisms. 1 Introduction Distributional clustering by the information bottleneck (IB) principle [20] groupsnominal values x of a random variable X by maximizing the dependency of thegroups with another, co-occurring discrete variable Y. Clustering documents x bythe occurrences of words y in them is an example. For a continuous X, the analogueof IB is to partition the space of possible values x∈ R

Samuel Kaski | Janne Sinkkonen | Leo Lahti | Janne Nikkilä

[1] R. Tibshirani,et al. Discriminant Analysis by Gaussian Mixtures , 1996 .

[2] Naftali Tishby,et al. Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[3] Donna R. Maglott,et al. RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[4] Zohar Yakhini,et al. Clustering gene expression patterns , 1999, J. Comput. Biol..

[5] Noam Slonim,et al. Maximum Likelihood and the Information Bottleneck , 2002, NIPS.

[6] I. Good. On the Application of Symmetric Dirichlet Distributions and their Mixtures to Contingency Tables , 1976 .

[7] D. Botstein,et al. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8] Nicola J. Rinaldi,et al. Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[9] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[10] Jim Kay,et al. Feature discovery under contextual supervision using mutual information , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[11] Peter G. Schultz,et al. Large-scale analysis of the human and mouse , 2002 .

[12] A. Orth,et al. Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13] D. Botstein,et al. Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14] Yudong D. He,et al. Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[15] Wray L. Buntine. Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[16] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17] Tommi S. Jaakkola,et al. Kernel Expansions with Unlabeled Examples , 2000, NIPS.

[18] David J. Miller,et al. A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[19] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[20] Gal Chechik,et al. Extracting Relevant Structures with Side Information , 2002, NIPS.

[21] Samuel Kaski,et al. Discriminative Clustering: Optimal Contingency Tables by Learning Metrics , 2002, ECML.

[22] Samuel Kaski,et al. Clustering Based on Conditional Distributions in an Auxiliary Space , 2002, Neural Computation.

[23] Ben Taskar,et al. Rich probabilistic models for gene expression , 2001, ISMB.

[24] Samuel Kaski,et al. Regularized discriminative clustering , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).