Efficient histogram dictionary learning for text/image modeling and classification

In dealing with text or image data, it is quite effective to represent them as histograms. In modeling histograms, although recent Bayesian topic models such as latent Dirichlet allocation and its variants are shown to be successful, they often suffer from computational overhead for inference of a large number of hidden variables. In this paper we consider a different modeling strategy of forming a dictionary of base histograms whose convex combination yields a histogram of observable text/image document. The dictionary entries are learned from data, which establishes direct/indirect association between specific topics/keywords and the base histograms. From a learned dictionary, the coding of an observed histogram can provide succinct and salient information useful for classification. One of our main contributions is that we propose a very efficient dictionary learning algorithm based on the recent Nesterov’s smooth optimization technique in conjunction with analytic solution methods for quadratic minimization sub-problems. Not alone the faster theoretical convergence rate, also in real time, our algorithm is 20–30 times faster than general-purpose optimizers such as interior-point methods. In classification/annotation tasks on several text/image datasets, our approach exhibits comparable or often superior performance to existing Bayesian models, while significantly faster than their variational inference.

[1]  P. Dooren,et al.  Non-negative matrix factorization with fixed row and column sums , 2008 .

[2]  Brendt Wohlberg,et al.  Task-driven dictionary learning for inpainting , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Pascal Frossard,et al.  Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[5]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[6]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[7]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[8]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[9]  Ioannis Pratikakis,et al.  Bag of spatio-visual words for context inference in scene classification , 2013, Pattern Recognit..

[10]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[11]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[12]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[13]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[14]  Csaba Szepesvári,et al.  Deep Representations and Codes for Image Auto-Annotation , 2012, NIPS.

[15]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Michael Elad,et al.  K-SVD and its non-negative variant for dictionary design , 2005, SPIE Optics + Photonics.

[18]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[19]  Allen Y. Yang,et al.  Fast L1-Minimization Algorithms For Robust Face Recognition , 2010 .

[20]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[21]  P. M. Suárez,et al.  A new formulation of the equivalent thermal in optimization ofhydrothermal systems , 2002 .

[22]  Thomas F. Coleman,et al.  A Reflective Newton Method for Minimizing a Quadratic Function Subject to Bounds on Some of the Variables , 1992, SIAM J. Optim..

[23]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Saharon Rosset,et al.  Following Curved Regularized Optimization Solution Paths , 2004, NIPS.

[25]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[26]  Vikas Sindhwani,et al.  Large-scale distributed non-negative sparse coding and sparse dictionary learning , 2012, KDD.

[27]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, CVPR.

[28]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[29]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[30]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[33]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[34]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[35]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[36]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Allen Y. Yang,et al.  Fast L1-Minimization Algorithms For Robust Face Recognition , 2010, 1007.3753.

[38]  Yunde Jia,et al.  FISHER NON-NEGATIVE MATRIX FACTORIZATION FOR LEARNING LOCAL FEATURES , 2004 .

[39]  Michael Werman,et al.  The Quadratic-Chi Histogram Distance Family , 2010, ECCV.

[40]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[41]  Christian Bauckhage,et al.  Convex Non-negative Matrix Factorization in the Wild , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[42]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[43]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[44]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.