Classification methods and inductive learning rules: what we may learn from theory

Inductive learning methods allow the system designer to infer a model of the relevant phenomena of an unknown process by extracting information from experimental data. A wide range of inductive learning methods is nowadays available, potentially ensuring different levels of accuracy on different problem domains. In this critical review of theoretic results gained in the last decade, we address the problem of designing an inductive classification system with optimal accuracy when domain knowledge is limited and the number of available experiments is-possibly-small. By analyzing the formal properties of consistent learning methods and of accuracy estimators, we wish to convey to the reader the message that the common practice of aggressively pursuing error minimization with different training algorithms and classification families is unjustified

[1]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[2]  W. H. Highleyman,et al.  The design and analysis of pattern recognition experiments , 1962 .

[3]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of one Variable and Addition , 1991 .

[4]  Daniel S. Hirschberg,et al.  Small Sample Statistics for Classification Error Rates I: Error Rate Measurements , 1996 .

[5]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[6]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[7]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[8]  J. Makhoul Pattern recognition properties of neural networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[9]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[10]  Cesare Alippi,et al.  A methodological approach to multisensor classification for innovative laser material processing units , 2001, IMTC 2001. Proceedings of the 18th IEEE Instrumentation and Measurement Technology Conference. Rediscovering Measurement in the Age of Informatics (Cat. No.01CH 37188).

[11]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[12]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[13]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[14]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[15]  John Platt,et al.  Fast training of svms using sequential minimal optimization , 1998 .

[16]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[17]  András Faragó,et al.  Strong universal consistency of neural network classifiers , 1993, IEEE Trans. Inf. Theory.