Dimensionality-Dependent Generalization Bounds for k-Dimensional Coding Schemes

The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order , where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, when n is finite and when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds.

[1]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[2]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[4]  P. Chou The distortion of vector quantizers trained on n vectors decreases to the optimum as O/sub p/(1/n) , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[5]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Pascal Frossard,et al.  Dictionary learning: What is the right representation for my signal? , 2011 .

[8]  László Györfi,et al.  Individual convergence rates in empirical vector quantizer design , 2005, IEEE Transactions on Information Theory.

[9]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[10]  Gábor Lugosi,et al.  The Minimax Distortion Redundancy in Empirical Quantizer Design , 1997 .

[11]  Zhigang Luo,et al.  Manifold Regularized Discriminative Nonnegative Matrix Factorization With Fast Gradient Descent , 2011, IEEE Transactions on Image Processing.

[12]  Shie Mannor,et al.  The Sample Complexity of Dictionary Learning , 2010, COLT.

[13]  Tao Hu,et al.  A Hebbian/Anti-Hebbian Neural Network for Linear Subspace Learning: A Derivation from Multidimensional Scaling of Streaming Data , 2015, Neural Computation.

[14]  Tamás Linder,et al.  The minimax distortion redundancy in empirical quantizer design , 1997, Proceedings of IEEE International Symposium on Information Theory.

[15]  Peter Dayan,et al.  The Effect of Correlated Variability on the Accuracy of a Population Code , 1999, Neural Computation.

[16]  Cl'ement Levrard,et al.  Nonasymptotic bounds for vector quantization in Hilbert spaces , 2014, 1405.6672.

[17]  Rémi Gribonval,et al.  Sample Complexity of Dictionary Learning and Other Matrix Factorizations , 2013, IEEE Transactions on Information Theory.

[18]  K. Alexander,et al.  Probability Inequalities for Empirical Processes and a Law of the Iterated Logarithm , 1984 .

[19]  Alexander G. Gray,et al.  Sparsity-Based Generalization Bounds for Predictive Sparse Coding , 2013, ICML.

[20]  András Antos,et al.  Improved minimax bounds on the test and training distortion of empirically designed vector quantizers , 2005, IEEE Transactions on Information Theory.

[21]  D. Pollard A Central Limit Theorem for $k$-Means Clustering , 1982 .

[22]  Chao Zhang Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence , 2013, UAI.

[23]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[24]  Yuan Yan Tang,et al.  Multiview Hessian discriminative sparse coding for image annotation , 2013, Comput. Vis. Image Underst..

[25]  Massimiliano Pontil,et al.  $K$ -Dimensional Coding Schemes in Hilbert Spaces , 2010, IEEE Transactions on Information Theory.

[26]  Simon Haykin,et al.  Improved Sparse Coding Under the Influence of Perceptual Attention , 2014, Neural Computation.

[27]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[28]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[29]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[30]  Junjie Wu,et al.  DIAS: A Disassemble-Assemble Framework for Highly Sparse Text Clustering , 2015, SDM.

[31]  R. Quian Quiroga,et al.  Unsupervised Spike Detection and Sorting with Wavelets and Superparamagnetic Clustering , 2004, Neural Computation.

[32]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[34]  Tamás Linder On the training distortion of vector quantizers , 2000, IEEE Trans. Inf. Theory.

[35]  Tamás Linder,et al.  Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding , 1994, IEEE Trans. Inf. Theory.

[36]  Michael R. Ibbotson,et al.  Sparse Coding on the Spot: Spontaneous Retinal Waves Suffice for Orientation Selectivity , 2012, Neural Computation.

[37]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[38]  A. Gyorgy,et al.  Improved convergence rates in empirical vector quantizer design , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[39]  Bernhard Schölkopf,et al.  Discovering Temporal Causal Relations from Subsampled Data , 2015, ICML.

[40]  Clément Levrard Fast rates for empirical vector quantization , 2012, 1201.6052.

[41]  Tieniu Tan,et al.  Feature Selection Based on Structured Sparsity: A Comprehensive Study , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Luc Devroye,et al.  On the Performance of Clustering in Hilbert Spaces , 2008, IEEE Transactions on Information Theory.

[43]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[45]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[46]  Zhigang Luo,et al.  NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization , 2012, IEEE Transactions on Signal Processing.

[47]  Dacheng Tao,et al.  On the Performance of Manhattan Nonnegative Matrix Factorization , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Michael Biehl,et al.  Distance Learning in Discriminative Vector Quantization , 2009, Neural Computation.

[49]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[50]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[51]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[52]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[53]  Junjie Wu,et al.  Spectral Ensemble Clustering , 2015, KDD.

[54]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[55]  Min Xu,et al.  Conditional Sparse Coding and Grouped Multivariate Regression , 2012, ICML.

[56]  Ming Shao,et al.  Infinite Ensemble for Image Clustering , 2016, KDD.

[57]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[58]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[60]  Nicolas Gillis,et al.  Fast and Robust Recursive Algorithmsfor Separable Nonnegative Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.