Rademacher Complexity Bounds for a Penalized Multi-class Semi-supervised Algorithm (Extended Abstract)

We propose Rademacher complexity bounds for multiclass cla sifiers trained with a two-step semi-supervised model. In the first s tep, the algorithm partitions the partially labeled data and then identifies dense clusters containingκ predominant classes using the labeled training examples such t hat the proportion of their non-predominant classes is below a fixed threshold. In the se cond step, a classifier is trained by minimizing a margin empirical loss over the lab eled training set and a penalization term measuring the disability of the learner t o predict theκ predominant classes of the identified clusters. The resulting data-depe ndent generalization error bound involves the margin distribution of the classifier, th e stability of the clustering technique used in the first step and Rademacher complexity te rms corresponding to partially labeled training data. Our theoretical result ex hibit convergence rates extending those proposed in the literature for the binary case , nd experimental results on different multiclass classification problems show empir ical evidence that supports the theory.

[1]  Ulrike von Luxburg,et al.  Clustering Stability: An Overview , 2010, Found. Trends Mach. Learn..

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Ohad Shamir,et al.  Cluster Stability for Finite Samples , 2007, NIPS.

[5]  M. Inés Torres,et al.  Pattern Recognition and Image Analysis , 2017, Lecture Notes in Computer Science.

[6]  Bo Wang,et al.  Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification , 2013, ICCV.

[7]  Ohad Shamir,et al.  On the Reliability of Clustering Stability in the Large Sample Regime , 2008, NIPS.

[8]  Subhransu Maji,et al.  Fast and Accurate Digit Classification , 2009 .

[9]  Karthikeyan Natesan Ramamurthy,et al.  Optimality and stability of the K-hyperline clustering algorithm , 2011, Pattern Recognit. Lett..

[10]  Matthijs Douze,et al.  Large-scale image classification with trace-norm regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[12]  Mehryar Mohri,et al.  Multi-Class Deep Boosting , 2014, NIPS.

[13]  Donald E. Knuth,et al.  Big Omicron and big Omega and big Theta , 1976, SIGA.

[14]  Ran El-Yaniv,et al.  Transductive Rademacher Complexity and Its Applications , 2007, COLT.

[15]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[16]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[17]  M. Mohri,et al.  Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes , 2015 .

[18]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[19]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[20]  Shai Ben-David,et al.  Access to Unlabeled Data can Speed up Prediction Time , 2011, ICML.

[21]  Shai Ben-David,et al.  Stability of k -Means Clustering , 2007, COLT.

[22]  Boaz Leskes,et al.  The Value of Agreement, a New Boosting Algorithm , 2005, COLT.

[23]  Ulrike von Luxburg,et al.  On the Convergence of Spectral Clustering on Random Samples: The Normalized Case , 2004, COLT.

[24]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[25]  François Laviolette,et al.  A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning , 2008, NIPS.

[26]  Leen Torenvliet,et al.  The value of agreement a new boosting algorithm , 2008, J. Comput. Syst. Sci..

[27]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[28]  Massih-Reza Amini,et al.  Learning with Partially Labeled and Interdependent Data , 2015, Springer International Publishing.

[29]  Alexander Binder,et al.  Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms , 2015, NIPS.

[30]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[31]  Ulrike von Luxburg,et al.  Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions , 2009, J. Mach. Learn. Res..

[32]  Mitchell J. Mergenthaler Nonparametrics: Statistical Methods Based on Ranks , 1979 .

[33]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[34]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[35]  Massih-Reza Amini,et al.  A boosting algorithm for learning bipartite ranking functions with partially labeled data , 2008, SIGIR '08.

[36]  Gilles Blanchard,et al.  Permutational Rademacher Complexity - A New Complexity Measure for Transductive Learning , 2015, ALT.

[37]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[38]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[41]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[42]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[43]  Matti Kääriäinen,et al.  Generalization Error Bounds Using Unlabeled Data , 2005, COLT.

[44]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[45]  Yury Maximov,et al.  Tight risk bounds for multi-class margin classifiers , 2016, Pattern Recognition and Image Analysis.

[46]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[47]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.