Neural Information Processing

In this paper, instead of modifying the framework of Extreme learning machine (ELM), we propose a learning algorithm to improve generalization ability of ELM with Synthetic Instances Generation (SIGELM). We focus on optimizing the output-layer weights via adding informative synthetic instances to the training dataset at each learning step. In order to get the required synthetic instances, a neighborhood is determined for each high-uncertainty training sample and then the synthetic instances which enhance the training performance of ELM are selected in the neighborhood. The experimental results based on 4 representative regression datasets of KEEL demonstrate that our proposed SIGELM obviously improves the generalization capability of ELM and effectively decreases the phenomenon of over-fitting.

[1]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[2]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[3]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[4]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[5]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[8]  Xinbo Gao,et al.  2DPCANet: Dayside Aurora Classification Based on Deep Learning , 2015, CCCV.

[9]  Alexandre d'Aspremont,et al.  Support vector machine classification with indefinite kernels , 2007, Math. Program. Comput..

[10]  Yuan Yan Tang,et al.  Multiview Hessian discriminative sparse coding for image annotation , 2013, Comput. Vis. Image Underst..

[11]  Jie Yang,et al.  Incremental Robust Nonnegative Matrix Factorization for Object Tracking , 2016, ICONIP.

[12]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[13]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[14]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Xiaohong Chen,et al.  Solving Indefinite Kernel Support Vector Machine with Difference of Convex Functions Programming , 2017, AAAI.

[17]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[18]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[19]  Weifeng Liu,et al.  Canonical correlation analysis networks for two-view image recognition , 2017, Inf. Sci..

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Weifeng Liu,et al.  Multiview Hessian Regularization for Image Annotation , 2013, IEEE Transactions on Image Processing.

[22]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[23]  Joachim M. Buhmann,et al.  Optimal Cluster Preserving Embedding of Nonmetric Proximity Data , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[25]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[26]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[27]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[28]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[29]  Colin Campbell,et al.  Analysis of SVM with Indefinite Kernels , 2009, NIPS.

[30]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[31]  David L. Donoho,et al.  Neighborly Polytopes And Sparse Solution Of Underdetermined Linear Equations , 2005 .

[32]  Lei Tian,et al.  Stacked PCA Network (SPCANet): An effective deep learning for face recognition , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).

[33]  Wang Yao,et al.  L 1/2 regularization , 2010 .

[34]  Lianwen Jin,et al.  DLANet: A manifold-learning-based discriminative feature learning network for scene classification , 2015, Neurocomputing.

[35]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[36]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[37]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[38]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[39]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Jiasong Wu,et al.  Kernel principal component analysis network for image classification , 2015, ArXiv.

[41]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[43]  Johan A. K. Suykens,et al.  Classification With Truncated $\ell _{1}$ Distance Kernel , 2018, IEEE Transactions on Neural Networks and Learning Systems.