EP-Based Infinite Inverted Dirichlet Mixture Learning: Application to Image Spam Detection

We propose in this paper a new fully unsupervised model based on a Dirichlet process prior and the inverted Dirichlet distribution that allows the automatic inferring of clusters from data. The main idea is to let the number of mixture components increases as new vectors arrive. This allows answering the model selection problem in a elegant way since the resulting model can be viewed as an infinite inverted Dirichlet mixture. An expectation propagation (EP) inference methodology is developed to learn this model by obtaining a full posterior distribution on its parameters. We validate the model on a challenging application namely image spam filtering to show the merits of the framework.

[1]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[2]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[3]  Nizar Bouguila,et al.  View-Based 3D Objects Recognition with Expectation Propagation Learning , 2016, ISVC.

[4]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[5]  Lawrence Carin,et al.  A Bayesian approach to unsupervised feature selection and density estimation using expectation propagation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  R. M. Korwar,et al.  Contributions to the Theory of Dirichlet Processes , 1973 .

[7]  Peter S. Maybeck,et al.  Stochastic Models, Estimation And Control , 2012 .

[8]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[9]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[10]  Fabio Roli,et al.  A survey and experimental evaluation of image spam filtering techniques , 2011, Pattern Recognit. Lett..

[11]  Arne Leijon,et al.  Expectation propagation for estimating the parameters of the beta distribution , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[13]  Nizar Bouguila,et al.  Positive vectors clustering using inverted Dirichlet finite mixture models , 2012, Expert Syst. Appl..

[14]  Fabio Roli,et al.  Image Spam Filtering Using Visual Information , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[15]  Nizar Bouguila,et al.  A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling , 2010, IEEE Transactions on Neural Networks.

[16]  Nizar Bouguila,et al.  Online spam filtering using support vector machines , 2009, 2009 IEEE Symposium on Computers and Communications.

[17]  Hongwei Liu,et al.  Infinite max-margin factor analysis via data augmentation , 2016, Pattern Recognit..

[18]  Mark Dredze,et al.  Learning Fast Classifiers for Image Spam , 2007, CEAS.

[19]  Bhaskar Mehta,et al.  Detecting image spam using visual features and near duplicate detection , 2008, WWW.

[20]  Nizar Bouguila,et al.  Infinite Liouville mixture models with application to text and texture categorization , 2012, Pattern Recognit. Lett..

[21]  Zoubin Ghahramani,et al.  Expectation Propagation for Infinite Mixtures , 2017 .

[22]  N. Bouguila,et al.  A Dirichlet process mixture of dirichlet distributions for classification and prediction , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[23]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[24]  Fabio Roli,et al.  Spam Filtering Based On The Analysis Of Text Information Embedded Into Images , 2006, J. Mach. Learn. Res..

[25]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[26]  Nizar Bouguila,et al.  Topic Novelty Detection Using Infinite Variational Inverted Dirichlet Mixture Models , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[27]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[28]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[29]  Nizar Bouguila,et al.  High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[31]  Nizar Bouguila,et al.  Improved Online Support Vector Machines Spam Filtering Using String Kernels , 2009, CIARP.

[32]  Nizar Bouguila,et al.  An Infinite Mixture of Inverted Dirichlet Distributions , 2011, ICONIP.