论文信息 - Model order selection: Criteria, inference strategies and an application to biclustering

Model order selection: Criteria, inference strategies and an application to biclustering

In this thesis we study unsupervised clustering methods that select the number of clusters on their own. Traditional methods based on information theory, compare different models by penalizing more complicated models. More recently a sophisticated method, known as the Dirichlet process has been applied to clustering problems; one of its biggest advantages is the theoretical sound foundation: we have one model for different number of clusters. This however comes at a price, too: The inference is arguably even harder than for " standard " clustering models, but in recent years researchers proposed approximation algorithms that run efficiently, but sacrifice accuracy to a certain extent. In this thesis we aim to empirically compare these algorithms on synthetic data. We also compare the results with algorithms stemming from different motivations than the Dirichlet process, such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). In the second part we then study the application of the Dirichlet process to the problem of biclustering and propose two novel nonparametric algorithms, each of them assuming a different problem formulation. The two algorithms might also prove to be useful for feature selection and dimensionality reduction. Acknowledgments First and above all, I want to express my gratitude to Peter Orbanz; during the course of this master's thesis he always took the time to answer my questions and he gave me valuable input how to improve certain experiments or express facts more concisely. He has the great gift of explaining things in an easy-to-understand way, which however doesn't sacrifice correctness. I could already witness that in the machine learning courses I attended as part of my studies where Peter was a teaching assistant. Then I also would like to thank Prof. Buhmann, for being my mentor and leveraging my interest in machine learning during my master's studies. He was very supportive in finding a topic iii Preface that suits my interests and knowledge, and also left a certain degree of freedom, to see where the journey takes us. Also, his comments in various meetings were very helpful in refining the biclustering models. I was fortunate enough to work half a year as an intern under the supervision of Matthew Brand at the Mitsubishi Electric Research Labs (MERL) in Cambridge, MA. Matt, possessing an immense knowledge of such diverse areas such as machine learning, graphics, computer vision or theoretical computer science, could usually answer questions I …

Patrick Pletscher | Patrick A. Pletscher

[1] W. Freeman,et al. Generalized Belief Propagation , 2000, NIPS.

[2] S. MacEachern,et al. A semiparametric Bayesian model for randomised block designs , 1996 .

[3] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[4] Kenichi Kurihara,et al. A Frequency-based Stochastic Blockmodel , 2006 .

[5] T. Ferguson. A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[6] Max Welling,et al. Accelerated Variational Dirichlet Process Mixtures , 2006, NIPS.

[7] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[8] Yee Whye Teh,et al. Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[9] Michael I. Jordan,et al. Variational inference for Dirichlet process mixtures , 2006 .

[10] M. West,et al. Hyperparameter estimation in Dirichlet process mixture models , 1992 .

[11] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[12] H. Akaike. A new look at the statistical model identification , 1974 .

[13] Thomas Hofmann,et al. Statistical Models for Co-occurrence Data , 1998 .

[14] Daniel N. Osherson,et al. Joshua Stern, Ormond Wilkie, Michael Stob, Edward E. Smith: Default Probability , 1991, Cogn. Sci..

[15] M. Schervish. Theory of Statistics , 1995 .

[16] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[17] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18] Zoubin Ghahramani,et al. Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[19] Joachim M. Buhmann,et al. Histogram clustering for unsupervised segmentation and image retrieval , 1999, Pattern Recognit. Lett..

[20] Arlindo L. Oliveira,et al. Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21] George M. Church,et al. Biclustering of Expression Data , 2000, ISMB.

[22] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23] Thomas L. Griffiths,et al. Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[24] Panos M. Pardalos,et al. Feature Selection for Consistent Biclustering via Fractional 0–1 Programming , 2005, J. Comb. Optim..

[25] C. Antoniak. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[26] Anil K. Jain,et al. Feature Selection in Mixture-Based Clustering , 2002, NIPS.

[27] B. Schölkopf,et al. Modeling Dyadic Data with Binary Latent Factors , 2007 .

[28] Hans-Hermann Bock. Two-Way Clustering for Contingency Tables: Maximizing a Dependence Measure , 2003 .

[29] Emanuele Trucco,et al. Robust motion and correspondence of noisy 3-D point sets with missing data , 1999, Pattern Recognit. Lett..

[30] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[31] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[32] J. Sethuraman. A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[33] Radford M. Neal. Bayesian Mixture Modeling , 1992 .

[34] M. Escobar. Estimating Normal Means with a Dirichlet Process Prior , 1994 .