Oriented principal component analysis for large margin classifiers

Large margin classifiers (such as MLPs) are designed to assign training samples with high confidence (or margin) to one of the classes. Recent theoretical results of these systems show why the use of regularisation terms and feature extractor techniques can enhance their generalisation properties. Since the optimal subset of features selected depends on the classification problem, but also on the particular classifier with which they are used, global learning algorithms for large margin classifiers that use feature extractor techniques are desired. A direct approach is to optimise a cost function based on the margin error, which also incorporates regularisation terms for controlling capacity. These terms must penalise a classifier with the largest margin for the problem at hand. Our work shows that the inclusion of a PCA term can be employed for this purpose. Since PCA only achieves an optimal discriminatory projection for some particular distribution of data, the margin of the classifier can then be effectively controlled. We also propose a simple constrained search for the global algorithm in which the feature extractor and the classifier are trained separately. This allows a degree of flexibility for including heuristics that can enhance the search and the performance of the computed solution. Experimental results demonstrate the potential of the proposed method.

[1]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[2]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[3]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[4]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  G. Deco,et al.  An Information-Theoretic Approach to Neural Computing , 1997, Perspectives in Neural Computing.

[7]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  Sun-Yuan Kung,et al.  Digital neural networks , 1993, Prentice Hall Information and System Sciences Series.

[10]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[11]  R. Gray,et al.  Combining Image Compression and Classification Using Vector Quantization , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[13]  Risto Miikkulainen,et al.  Laterally Interconnected Self-Organizing Maps in Hand-Written Digit Recognition , 1995, NIPS.

[14]  Corinna Cortes,et al.  Prediction of Generalization Ability in Learning Machines , 1994 .

[15]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[16]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[17]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[18]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[19]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[20]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[21]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[22]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[23]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[24]  I. Jolliffe Principal Component Analysis , 2002 .

[25]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[26]  Sergio Bermejo,et al.  Adaptive soft k-nearest-neighbour classifiers , 2000, Pattern Recognit..

[27]  Robert M. Gray,et al.  Bayes risk weighted vector quantization with posterior estimation for image compression and classification , 1996, IEEE Trans. Image Process..

[28]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[29]  Aapo Hyvärinen,et al.  A Fast Fixed-Point Algorithm for Independent Component Analysis , 1997, Neural Computation.

[30]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[31]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[34]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[35]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[36]  R. Schnabel,et al.  A view of unconstrained optimization , 1989 .