Some Theoretical Results Concerning the Convergence of Compositions of Regularized Linear Functions

Recently, sample complexity bounds have been derived for problems involving linear functions such as neural networks and support vector machines. In this paper, we extend some theoretical results in this area by deriving dimensional independent covering number bounds for regularized linear functions under certain regularization conditions. We show that such bounds lead to a class of new methods for training linear classifiers with similar theoretical advantages of the support vector machine. Furthermore, we also present a theoretical analysis for these new methods from the asymptotic statistical point of view. This technique provides better description for large sample behaviors of these algorithms.

[1]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[2]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[3]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[4]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[7]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[8]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[9]  R. Dudley A course on empirical processes , 1984 .

[10]  Tong Zhang Analysis of Regularized Linear Functions for Classification Problems , 1999 .

[11]  David Haussler,et al.  Generalizing the PAC model: sample size bounds from metric dimension-based uniform convergence results , 1989, 30th Annual Symposium on Foundations of Computer Science.

[12]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[13]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[14]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .