Deep learning with support vector data description

Abstract One of the most critical problems for machine learning methods is overfitting. The overfitting problem is a phenomenon in which the accuracy of the model on unseen data is poor whereas the training accuracy is nearly perfect. This problem is particularly severe in complex models that have a large set of parameters. In this paper, we propose a deep learning neural network model that adopts the support vector data description (SVDD). The SVDD is a variant of the support vector machine, which has high generalization performance by acquiring a maximal margin in one-class classification problems. The proposed model strives to obtain the representational power of deep learning. Generalization performance is maintained using the SVDD. The experimental results showed that the proposed model can learn multiclass data without severe overfitting problems.

[1]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[2]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[3]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[5]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[6]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[7]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[8]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[9]  Furong Gao,et al.  Batch process monitoring based on support vector data description method , 2011 .

[10]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[11]  Sang-Woong Lee,et al.  Low resolution face recognition based on support vector data description , 2006, Pattern Recognit..

[12]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[13]  Robert P. W. Duin,et al.  Uniform Object Generation for Optimizing One-class Classifiers , 2002, J. Mach. Learn. Res..

[14]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[15]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[16]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[17]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[18]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[19]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[20]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[21]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[22]  Amit Banerjee,et al.  Fast Hyperspectral Anomaly Detection via SVDD , 2007, 2007 IEEE International Conference on Image Processing.

[23]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[24]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[25]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[26]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[27]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[28]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[29]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[30]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[31]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[32]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[34]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[35]  Minho Lee,et al.  Deep Network with Support Vector Machines , 2013, ICONIP.

[36]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[37]  Donald D. Lucas,et al.  Failure analysis of parameter-induced simulation crashes in climate models , 2013 .

[38]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[39]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .