Deep learning of support vector machines with class probability output networks

Deep learning methods endeavor to learn features automatically at multiple levels and allow systems to learn complex functions mapping from the input space to the output space for the given data. The ability to learn powerful features automatically is increasingly important as the volume of data and range of applications of machine learning methods continues to grow. This paper proposes a new deep architecture that uses support vector machines (SVMs) with class probability output networks (CPONs) to provide better generalization power for pattern classification problems. As a result, deep features are extracted without additional feature engineering steps, using multiple layers of the SVM classifiers with CPONs. The proposed structure closely approaches the ideal Bayes classifier as the number of layers increases. Using a simulation of classification problems, the effectiveness of the proposed method is demonstrated.

[1]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[2]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[3]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[4]  Rhee Man Kil,et al.  Automatic media data rating based on class probability output networks , 2010, IEEE Transactions on Consumer Electronics.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[7]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.

[8]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  A. Ben-Tal,et al.  A unified theory of first and second order conditions for extremum problems in topological vector spaces , 1982 .

[11]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[12]  Vijay K. Rohatgi,et al.  Nonparametric Statistical Inference , 2011 .

[13]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[16]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[17]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[20]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[21]  Marco Wiering,et al.  Deep Support Vector Machines for Regression Problems , 2013 .

[22]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[23]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[24]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[25]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[26]  Y. Freund Boosting a Weak Learning Algorithm by Majority to Be Published in Information and Computation , 1995 .

[27]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[28]  Rhee Man Kil,et al.  Pattern Classification With Class Probability Output Network , 2009, IEEE Transactions on Neural Networks.

[29]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[30]  M. Guignard Generalized Kuhn–Tucker Conditions for Mathematical Programming Problems in a Banach Space , 1969 .

[31]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[32]  Stephen José Hanson,et al.  Computational Learning Theory and Natural Learning , 1996 .

[33]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[34]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[35]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[36]  Minho Lee,et al.  Deep Network with Support Vector Machines , 2013, ICONIP.

[37]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[38]  Rhee Man Kil,et al.  Uncertainty Measure for Selective Sampling Based on Class Probability Output Networks , 2011, ICONIP.