Unsupervised Hybrid Feature Extraction Selection for High-Dimensional Non-Gaussian Data Clustering with Variational Inference

Clustering has been a subject of extensive research in data mining, pattern recognition, and other areas for several decades. The main goal is to assign samples, which are typically non-Gaussian and expressed as points in high-dimensional feature spaces, to one of a number of clusters. It is well known that in such high-dimensional settings, the existence of irrelevant features generally compromises modeling capabilities. In this paper, we propose a variational inference framework for unsupervised non-Gaussian feature selection, in the context of finite generalized Dirichlet (GD) mixture-based clustering. Under the proposed principled variational framework, we simultaneously estimate, in a closed form, all the involved parameters and determine the complexity (i.e., both model an feature selection) of the GD mixture. Extensive simulations using synthetic data along with an analysis of real-world data and human action videos demonstrate that our variational approach achieves better results than comparable techniques.

[1]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[3]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[4]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[5]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[6]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[7]  Heng Lian,et al.  Sparse Bayesian hierarchical modeling of high-dimensional clustering problems , 2009, J. Multivar. Anal..

[8]  P. Hall,et al.  On the adequacy of variational lower bound functions for likelihood‐based inference in Markovian models with missing values , 2002 .

[9]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[11]  Nizar Bouguila,et al.  A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture , 2006, IEEE Transactions on Image Processing.

[12]  Michael I. Jordan,et al.  Computing upper and lower bounds on likelihoods in intractable networks , 1996, UAI.

[13]  Nizar Bouguila,et al.  A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Enrique F. Castillo,et al.  Learning and Updating of Uncertainty in Dirichlet Models , 2004, Machine Learning.

[15]  Peter Grünwald,et al.  Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .

[16]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[17]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[18]  Adrian G. Bors,et al.  Variational expectation-maximization training for Gaussian networks , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[19]  Todd K. Leen,et al.  Feature selection for improved classification , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[20]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[21]  Michael I. Jordan,et al.  Variational Probabilistic Inference and the QMR-DT Network , 2011, J. Artif. Intell. Res..

[22]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  P. Deb Finite Mixture Models , 2008 .

[24]  Nizar Bouguila,et al.  A Model-Based Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[26]  Alexander Brook Variational Segmentation for Color images , 2002 .

[27]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Nizar Bouguila,et al.  Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach , 2006, IEEE Transactions on Knowledge and Data Engineering.

[29]  E. Fowlkes,et al.  Variable selection in clustering , 1988 .

[30]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[31]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[32]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[33]  Jing Hua,et al.  Simultaneous Localized Feature Selection and Model Detection for Gaussian Mixtures , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Aristidis Likas,et al.  Bayesian feature and model selection for Gaussian mixture models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[36]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[37]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[38]  R. Horgan,et al.  Statistical Field Theory , 2014 .

[39]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[40]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[41]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[42]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Nizar Bouguila,et al.  Practical Bayesian estimation of a finite beta mixture through gibbs sampling and its applications , 2006, Stat. Comput..

[44]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[45]  P. Green,et al.  A preliminary study of optimal variable weighting in k-means clustering , 1990 .

[46]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[48]  Nizar Bouguila,et al.  On Bayesian analysis of a finite generalized Dirichlet mixture via a Metropolis-within-Gibbs sampling , 2009, Pattern Analysis and Applications.

[49]  Alan L. Yuille,et al.  Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[50]  M. Brusco,et al.  A variable-selection heuristic for K-means clustering , 2001 .

[51]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[52]  David J. Miller,et al.  Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection , 2006, IEEE Transactions on Signal Processing.

[53]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[54]  Robert P. W. Duin,et al.  An Evaluation of Intrinsic Dimensionality Estimators , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Donald E. Brown,et al.  Fast generic selection of features for neural network classifiers , 1992, IEEE Trans. Neural Networks.

[56]  William D. Penny,et al.  Variational Bayes for generalized autoregressive models , 2002, IEEE Trans. Signal Process..

[57]  Nizar Bouguila,et al.  A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling , 2010, IEEE Transactions on Neural Networks.

[58]  O. Cappé,et al.  Markov Chain Monte Carlo: 10 Years and Still Running! , 2000 .

[59]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[60]  Suzanna Becker,et al.  Learning to Categorize Objects Using Temporal Coherence , 1992, NIPS.

[61]  Jitendra Malik,et al.  Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[62]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[63]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[64]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[65]  G. Celeux,et al.  Variable Selection for Clustering with Gaussian Mixture Models , 2009, Biometrics.

[66]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[67]  G. W. Milligan,et al.  A validation study of a variable weighting algorithm for cluster analysis , 1989 .

[68]  Aapo Hyvärinen,et al.  Proc. Conf. on Uncertainty in Artificial Intelligence (UAI) , 2010 .

[69]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[70]  Shuicheng Yan,et al.  SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[71]  Nizar Bouguila,et al.  High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Zhiwu Lu,et al.  Generalized Competitive Learning of Gaussian Mixture Models , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[73]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.