An Infinite Mixture of Inverted Dirichlet Distributions

In this paper we present an infinite mixture model based on inverted Dirichlet distributions. The proposed mixture is learned using a fully Bayesian approach and allows to overcome a challenging issue when dealing with data clustering namely the automatic selection of the number of clusters. We explore the performance of the proposed approach on the challenging problem of text categorization. The results show that the proposed approach is effective for positive data modeling when compared to those reported using infinite Gaussian mixture.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[3]  N. Pillai,et al.  Bayesian density regression , 2007 .

[4]  Guoyin Wang,et al.  Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing , 2013, Lecture Notes in Computer Science.

[5]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[6]  Jason A. Duan,et al.  Generalized spatial dirichlet process models , 2007 .

[7]  Gérard Govaert,et al.  Model-based cluster and discriminant analysis with the MIXMOD software , 2006, Comput. Stat. Data Anal..

[8]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[9]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[10]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[11]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[12]  Nizar Bouguila,et al.  MML-Based Approach for Finite Dirichlet Mixture Estimation and Selection , 2005, MLDM.

[13]  Nizar Bouguila,et al.  A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling , 2010, IEEE Transactions on Neural Networks.

[14]  A. Gelfand,et al.  Bayesian Nonparametric Functional Data Analysis Through Density Estimation. , 2009, Biometrika.

[15]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jian Li,et al.  Multi-model approach to model selection , 2004, Digit. Signal Process..

[17]  N. Bouguila,et al.  A Dirichlet process mixture of dirichlet distributions for classification and prediction , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[18]  W. Gilks,et al.  Adaptive rejection sampling from log-concave density functions , 1993 .

[19]  Maneesha Singh,et al.  Pattern Recognition and Data Mining, Third International Conference on Advances in Pattern Recognition, ICAPR 2005, Bath, UK, August 22-25, 2005, Proceedings, Part I , 2005, International Conference on Advances in Pattern Recognition.

[20]  George G. Tiao,et al.  The Inverted Dirichlet Distribution with Applications , 1965 .

[21]  Nizar Bouguila,et al.  Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data , 2007, NIPS.

[22]  Nizar Bouguila,et al.  On Fitting Finite Dirichlet Mixture Using ECM and MML , 2005, ICAPR.

[23]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[24]  Nizar Bouguila,et al.  A Nonparametric Bayesian Learning Model: Application to Text and Image Categorization , 2009, PAKDD.

[25]  Nizar Bouguila,et al.  Learning Inverted Dirichlet Mixtures for Positive Data Clustering , 2011, RSFDGrC.