论文信息 - Dimensionality-Dependent Generalization Bounds for k-Dimensional Coding Schemes

Dimensionality-Dependent Generalization Bounds for k-Dimensional Coding Schemes

The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order , where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, when n is finite and when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds.

[1] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[2] Allen Y. Yang,et al. Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] R. Gray,et al. Vector quantization , 1984, IEEE ASSP Magazine.

[4] P. Chou. The distortion of vector quantizers trained on n vectors decreases to the optimum as O/sub p/(1/n) , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[5] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[6] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[7] Pascal Frossard,et al. Dictionary learning: What is the right representation for my signal? , 2011 .

[8] László Györfi,et al. Individual convergence rates in empirical vector quantizer design , 2005, IEEE Transactions on Information Theory.

[9] Michael Biehl,et al. Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[10] Gábor Lugosi,et al. The Minimax Distortion Redundancy in Empirical Quantizer Design , 1997 .

[11] Zhigang Luo,et al. Manifold Regularized Discriminative Nonnegative Matrix Factorization With Fast Gradient Descent , 2011, IEEE Transactions on Image Processing.

[12] Shie Mannor,et al. The Sample Complexity of Dictionary Learning , 2010, COLT.

[13] Tao Hu,et al. A Hebbian/Anti-Hebbian Neural Network for Linear Subspace Learning: A Derivation from Multidimensional Scaling of Streaming Data , 2015, Neural Computation.

[14] Tamás Linder,et al. The minimax distortion redundancy in empirical quantizer design , 1997, Proceedings of IEEE International Symposium on Information Theory.

[15] Peter Dayan,et al. The Effect of Correlated Variability on the Accuracy of a Population Code , 1999, Neural Computation.

[16] Cl'ement Levrard,et al. Nonasymptotic bounds for vector quantization in Hilbert spaces , 2014, 1405.6672.

[17] Rémi Gribonval,et al. Sample Complexity of Dictionary Learning and Other Matrix Factorizations , 2013, IEEE Transactions on Information Theory.

[18] K. Alexander,et al. Probability Inequalities for Empirical Processes and a Law of the Iterated Logarithm , 1984 .

[19] Alexander G. Gray,et al. Sparsity-Based Generalization Bounds for Predictive Sparse Coding , 2013, ICML.

[20] András Antos,et al. Improved minimax bounds on the test and training distortion of empirically designed vector quantizers , 2005, IEEE Transactions on Information Theory.

[21] D. Pollard. A Central Limit Theorem for $k$-Means Clustering , 1982 .

[22] Chao Zhang. Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence , 2013, UAI.

[23] Felipe Cucker,et al. On the mathematical foundations of learning , 2001 .

[24] Yuan Yan Tang,et al. Multiview Hessian discriminative sparse coding for image annotation , 2013, Comput. Vis. Image Underst..

[25] Massimiliano Pontil,et al. $K$ -Dimensional Coding Schemes in Hilbert Spaces , 2010, IEEE Transactions on Information Theory.

[26] Simon Haykin,et al. Improved Sparse Coding Under the Influence of Perceptual Attention , 2014, Neural Computation.

[27] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[28] M. Talagrand. Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[29] C. Ding,et al. On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[30] Junjie Wu,et al. DIAS: A Disassemble-Assemble Framework for Highly Sparse Text Clustering , 2015, SDM.

[31] R. Quian Quiroga,et al. Unsupervised Spike Detection and Sorting with Wavelets and Superparamagnetic Clustering , 2004, Neural Computation.

[32] Inderjit S. Dhillon,et al. Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[34] Tamás Linder. On the training distortion of vector quantizers , 2000, IEEE Trans. Inf. Theory.

[35] Tamás Linder,et al. Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding , 1994, IEEE Trans. Inf. Theory.

[36] Michael R. Ibbotson,et al. Sparse Coding on the Spot: Spontaneous Retinal Waves Suffice for Orientation Selectivity , 2012, Neural Computation.

[37] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[38] A. Gyorgy,et al. Improved convergence rates in empirical vector quantizer design , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[39] Bernhard Schölkopf,et al. Discovering Temporal Causal Relations from Subsampled Data , 2015, ICML.

[40] Clément Levrard. Fast rates for empirical vector quantization , 2012, 1201.6052.

[41] Tieniu Tan,et al. Feature Selection Based on Structured Sparsity: A Comprehensive Study , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[42] Luc Devroye,et al. On the Performance of Clustering in Hilbert Spaces , 2008, IEEE Transactions on Information Theory.

[43] Jean Ponce,et al. Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[45] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[46] Zhigang Luo,et al. NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization , 2012, IEEE Transactions on Signal Processing.

[47] Dacheng Tao,et al. On the Performance of Manhattan Nonnegative Matrix Factorization , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[48] Michael Biehl,et al. Distance Learning in Discriminative Vector Quantization , 2009, Neural Computation.

[49] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[50] Michael R. Anderberg,et al. Cluster Analysis for Applications , 1973 .

[51] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[52] H. Damasio,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[53] Junjie Wu,et al. Spectral Ensemble Clustering , 2015, KDD.

[54] Chris H. Q. Ding,et al. On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[55] Min Xu,et al. Conditional Sparse Coding and Grouped Multivariate Regression , 2012, ICML.

[56] Ming Shao,et al. Infinite Ensemble for Image Clustering , 2016, KDD.

[57] Massimiliano Pontil,et al. Sparse coding for multitask and transfer learning , 2012, ICML.

[58] D.M. Mount,et al. An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[59] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[60] Nicolas Gillis,et al. Fast and Robust Recursive Algorithmsfor Separable Nonnegative Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.