Transductive Rademacher Complexity and Its Applications

We develop a technique for deriving data-dependent error bounds for transductive learning algorithms based on transductive Rademacher complexity. Our technique is based on a novel general error bound for transduction in terms of transductive Rademacher complexity, together with a novel bounding technique for Rademacher averages for particular algorithms, in terms of their "unlabeled-labeled" representation. This technique is relevant to many advanced graph-based transductive algorithms and we demonstrate its effectiveness by deriving error bounds to three well known algorithms. Finally, we present a new PAC-Bayesian bound for mixtures of transductive algorithms based on our Rademacher bounds.

[1]  C. McDiarmid Concentration , 1862, The Dental register.

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[4]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[5]  R. Serfling Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[6]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[7]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[8]  Editors , 1986, Brain Research Bulletin.

[9]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[10]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[11]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[12]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[13]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[14]  M. Habib Probabilistic methods for algorithmic discrete mathematics , 1998 .

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[18]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[19]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[20]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[21]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[22]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[23]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[24]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[25]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[26]  John Langford,et al.  PAC-MDL Bounds , 2003, COLT.

[27]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[28]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[29]  Jean-Yves Audibert A BETTER VARIANCE CONTROL FOR PAC-BAYESIAN CLASSIFICATION , 2004 .

[30]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[31]  O. Catoni Improved Vapnik Cervonenkis bounds , 2004, math/0410280.

[32]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[33]  Ran El-Yaniv,et al.  Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms , 2004, J. Artif. Intell. Res..

[34]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[35]  Ran El-Yaniv,et al.  Effective transductive learning via objective model selection , 2005, Pattern Recognit. Lett..

[36]  Tong Zhang,et al.  Analysis of Spectral Kernel Design based Semi-supervised Learning , 2005, NIPS.

[37]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[38]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[39]  Mark Herbster,et al.  Online learning over graphs , 2005, ICML.

[40]  Alexander Zien,et al.  An Augmented PAC Model for Semi-Supervised Learning , 2006 .

[41]  Steve Hanneke,et al.  An analysis of graph cut size for transductive learning , 2006, ICML.

[42]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[43]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[44]  Ran El-Yaniv,et al.  Stable Transductive Learning , 2006, COLT.

[45]  Tong Zhang,et al.  On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning , 2007, J. Mach. Learn. Res..

[46]  John Shawe-Taylor,et al.  Complexity of pattern classes and the Lipschitz property , 2007, Theor. Comput. Sci..

[47]  Tong Zhang,et al.  Graph-Based Semi-Supervised Learning and Spectral Kernel Design , 2008, IEEE Transactions on Information Theory.