Generalization performance of multiclass discriminant models

Starting from a direct definition of the notion of margin in the multiclass case, we study the generalization performance of multiclass discriminant systems. In the framework of statistical learning theory, we establish on this performance a bound based on covering numbers. An application to a linear ensemble method which estimates the class posterior probabilities provides us with a way to compare this bound and another one based on combinatorial dimensions, with respect to the capacity measure they incorporate. Experimental results highlight their usefulness for a real-world problem.

[1]  Hélène Paugam-Moisy,et al.  Estimating the sample complexity of a multi-class discriminant model , 1999 .

[2]  Christophe Geourjon,et al.  Improved performance in protein secondary structure prediction by inhomogeneous score combination , 1999, Bioinform..

[3]  A. Elisseeff,et al.  Margin Error and Generalization Capabilities of Multi-Class Discriminant Systems , 2000 .

[4]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[5]  Balas K. Natarajan,et al.  On learning sets and functions , 2004, Machine Learning.

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  Hélène Paugam-Moisy,et al.  A new multi-class SVM based on a uniform convergence result , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[8]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[9]  V. Vapnik,et al.  Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations , 1982 .

[10]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[11]  John Shawe-Taylor,et al.  Sample sizes for multiple-output threshold networks , 1991 .

[12]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[15]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .