Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection

This paper introduces a novel enhancement for unsupervised feature selection based on generalized Dirichlet (GD) mixture models. Our proposal is based on the extension of the finite mixture model previously developed in [1] to the infinite case, via the consideration of Dirichlet process mixtures, which can be viewed actually as a purely nonparametric model since the number of mixture components can increase as data are introduced. The infinite assumption is used to avoid problems related to model selection (i.e. determination of the number of clusters) and allows simultaneous separation of data in to similar clusters and selection of relevant features. Our resulting model is learned within a principled variational Bayesian framework that we have developed. The experimental results reported for both synthetic data and real-world challenging applications involving image categorization, automatic semantic annotation and retrieval show the ability of our approach to provide accurate models by distinguishing between relevant and irrelevant features without over- or under-fitting the data.

[1]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[2]  S. Roberts,et al.  Variational Bayes for non-Gaussian autoregressive models , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[3]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.

[4]  Jia Li A mutual semantic endorsement approach to image retrieval and context provision , 2005, MIR '05.

[5]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[6]  Jianping Fan,et al.  Statistical modeling and conceptualization of natural images , 2005, Pattern Recognit..

[7]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[11]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Spiridon D. Likothanassis,et al.  Integrating feature and instance selection for text classification , 2002, KDD.

[13]  Pedro Larrañaga,et al.  Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Jing Zhou,et al.  Streaming feature selection using alpha-investing , 2005, KDD '05.

[15]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[16]  Bernt Schiele,et al.  Local features for object class recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Nizar Bouguila,et al.  A Graphical Model for Content Based Image Suggestion and Feature Selection , 2007, PKDD.

[18]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[20]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[21]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[22]  Nizar Bouguila,et al.  A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture , 2006, IEEE Transactions on Image Processing.

[23]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[24]  Nizar Bouguila,et al.  High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Hilbert J. Kappen,et al.  General Lower Bounds based on Computer Generated Higher Order Expansions , 2012, UAI.

[26]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[27]  Matthieu Cord,et al.  Feature-based approach to semi-supervised similarity learning , 2006, Pattern Recognit..

[28]  Jordi Vitrià,et al.  On the Selection and Classification of Independent Features , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[30]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[31]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[32]  Jiri Matas,et al.  Object Recognition using the Invariant Pixel-Set Signature , 2000, BMVC.

[33]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[34]  Jinwen Ma,et al.  A cost-function approach to rival penalized competitive learning (RPCL) , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Dean P. Foster,et al.  Variable Selection in Data Mining , 2004 .

[36]  D. Dunson Bayesian Semiparametric Isotonic Regression for Count Data , 2005 .

[37]  M. Clyde,et al.  Multiple shrinkage and subset selection in wavelets , 1998 .

[38]  Hichem Frigui,et al.  Clustering by competitive agglomeration , 1997, Pattern Recognit..

[39]  Martijn A. R. Leisink,et al.  Computer generated higher order expansions , 2002, UAI 2002.

[40]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  William I. Grosky,et al.  From features to semantics: some preliminary results , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[42]  James Ze Wang,et al.  Toward bridging the annotation-retrieval gap in image search by a generative modeling approach , 2006, MM '06.

[43]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[44]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[45]  HofmannThomas Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2001 .

[46]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[47]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[48]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[49]  K JainAnil,et al.  Simultaneous Feature Selection and Clustering Using Mixture Models , 2004 .

[50]  Adrian E. Raftery,et al.  Linear flaw detection in woven textiles using model-based clustering , 1997, Pattern Recognit. Lett..

[51]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[52]  ZissermanAndrew,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008 .

[53]  Lexing Xie,et al.  Slightly Supervised Learning of Part-Based Appearance Models , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[54]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[55]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[56]  Shaoping Ma,et al.  Relevance feedback in content-based image retrieval: Bayesian framework, feature subspaces, and progressive learning , 2003, IEEE Trans. Image Process..

[57]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[58]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[59]  Nizar Bouguila,et al.  A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling , 2010, IEEE Transactions on Neural Networks.

[60]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[61]  Volker Roth,et al.  Bayesian class discovery in microarray datasets , 2004, IEEE Transactions on Biomedical Engineering.

[62]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[63]  Yimin Wu,et al.  Feature selection for classifying high-dimensional numerical data , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[64]  Bruce A. Draper,et al.  Feature selection from huge feature sets , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[65]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[66]  Marina Meila,et al.  The uniqueness of a good optimum for K-means , 2006, ICML.

[67]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[68]  J. Dickey Multiple Hypergeometric Functions: Probabilistic Interpretations and Statistical Uses , 1983 .

[69]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Hong-Jiang Zhang Relevance Feedback in Content-Based Image Retrieval , 2003 .

[71]  Thierry Denoeux,et al.  Learning from partially supervised data using mixture models and belief functions , 2009, Pattern Recognit..

[72]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Lancelot F. James,et al.  Some further developments for stick-breaking priors: Finite and infinite clustering and classification , 2003 .

[74]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[75]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[76]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[77]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[78]  Jiebo Luo,et al.  A Bayesian network-based framework for semantic image understanding , 2005, Pattern Recognit..

[79]  Temple F. Smith Occam's razor , 1980, Nature.

[80]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[81]  P. Deb Finite Mixture Models , 2008 .

[82]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[83]  Nizar Bouguila,et al.  A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[85]  T. Ferguson BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[86]  R. M. Korwar,et al.  Contributions to the Theory of Dirichlet Processes , 1973 .

[87]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[88]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[89]  Aristidis Likas,et al.  Bayesian feature and model selection for Gaussian mixture models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  D. Dunson,et al.  Bayesian Selection and Clustering of Polymorphisms in Functionally Related Genes , 2008 .

[91]  Nicolas Hervé,et al.  Image annotation: which approach for realistic databases? , 2007, CIVR '07.

[92]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[93]  Nizar Bouguila,et al.  A countably infinite mixture model for clustering and feature selection , 2011, Knowledge and Information Systems.

[94]  Tharam S. Dillon,et al.  A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[95]  Ji Zhu,et al.  Variable Selection for Model‐Based High‐Dimensional Clustering and Its Application to Microarray Data , 2008, Biometrics.