Negative correlation learning in the extreme learning machine framework

Extreme learning machine (ELM) has shown to be a suitable algorithm for classification problems. Several ensemble meta-algorithms have been developed in order to generalize the results of ELM models. Ensemble approaches introduced in the ELM literature mainly come from boosting and bagging frameworks. The generalization of these methods relies on data sampling procedures, under the assumption that training data are heterogeneously enough to set up diverse base learners. The proposed ELM ensemble model overcomes this strong assumption by using the negative correlation learning (NCL) framework. An alternative diversity metric based on the orthogonality of the outputs is proposed. The error function formulation allows us to develop an analytical solution to the parameters of the ELM base learners, which significantly reduce the computational burden of the standard NCL ensemble method. The proposed ensemble method has been validated by an experimental study with a variety of benchmark datasets, comparing it with the existing ensemble methods in ELM. Finally, the proposed method statistically outperforms the comparison ensemble methods in accuracy, also reporting a competitive computational burden (specially if compared to the baseline NCL-inspired method).

[1]  Han Zhao,et al.  Extreme learning machine: algorithm, theory and applications , 2013, Artificial Intelligence Review.

[2]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[6]  William W. Hager,et al.  Updating the Inverse of a Matrix , 1989, SIAM Rev..

[7]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[8]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[9]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[10]  Eduardo Coutinho,et al.  Connecting Subspace Learning and Extreme Learning Machine in Speech Emotion Recognition , 2019, IEEE Transactions on Multimedia.

[11]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[12]  D. Mackay,et al.  HYPERPARAMETERS: OPTIMIZE, OR INTEGRATE OUT? , 1996 .

[13]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[14]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[15]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  Gerhard Tutz,et al.  Boosting ridge regression , 2007, Comput. Stat. Data Anal..

[18]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[19]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[20]  Pedro M. Domingos Why Does Bagging Work? A Bayesian Account and its Implications , 1997, KDD.

[21]  Huanhuan Chen,et al.  Negative correlation learning for classification ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[22]  Xin Yao,et al.  Negatively correlated neural networks for classification , 2006, Artificial Life and Robotics.

[23]  Ron Kohavi,et al.  Automatic Parameter Selection by Minimizing Estimated Error , 1995, ICML.

[24]  Huanhuan Chen,et al.  Semisupervised Negative Correlation Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Erkki Oja,et al.  GPU-accelerated and parallelized ELM ensembles for large-scale regression , 2011, Neurocomputing.

[27]  Jun Zhao,et al.  Parallelized incremental support vector machines based on MapReduce and Bagging technique , 2012, 2012 IEEE International Conference on Information Science and Technology.

[28]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[30]  Nicolas H. Younan,et al.  Fusion of diverse features and kernels using LP-norm based multiple kernel learning in hyperspectral image processing , 2016, 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS).

[31]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[32]  Wang Xin,et al.  Boosting ridge extreme learning machine , 2012, 2012 IEEE Symposium on Robotics and Applications (ISRA).

[33]  David Mease,et al.  Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[34]  Xiaobo Li,et al.  Region-Enhanced Multi-layer Extreme Learning Machine , 2018, Cognitive Computation.

[35]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[36]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[37]  Annalisa Riccardi,et al.  Cost-Sensitive AdaBoost Algorithm for Ordinal Regression Based on Extreme Learning Machine , 2014, IEEE Transactions on Cybernetics.

[38]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[39]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[40]  Steven R. Young,et al.  Optimizing deep learning hyper-parameters through an evolutionary algorithm , 2015, MLHPC@SC.

[41]  Gavin Brown,et al.  Negative Correlation Learning and the Ambiguity Family of Ensemble Methods , 2003, Multiple Classifier Systems.

[42]  Huanhuan Chen,et al.  Regularized Negative Correlation Learning for Neural Network Ensembles , 2009, IEEE Transactions on Neural Networks.

[43]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[44]  Mehmet Özçalici,et al.  Stock price prediction using hybrid soft computing models incorporating parameter tuning and input variable selection , 2019, Neural Computing and Applications.

[45]  Siamak Mehrkanoon,et al.  Deep neural-kernel blocks , 2019, Neural Networks.

[46]  Huanhuan Chen,et al.  Multiobjective Neural Network Ensembles Based on Regularized Negative Correlation Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[47]  Guang-Bin Huang,et al.  Extreme Learning Machine for Multilayer Perceptron , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Sang Won Yoon,et al.  A support vector machine-based ensemble algorithm for breast cancer diagnosis , 2017, Eur. J. Oper. Res..

[49]  Stefan Schaal,et al.  From Isolation to Cooperation: An Alternative View of a System of Experts , 1995, NIPS.

[50]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[51]  Xiaopeng Sha,et al.  Interval LASSO regression based extreme learning machine for nonlinear multivariate calibration of near infrared spectroscopic datasets , 2018 .

[52]  Pedro Antonio Gutiérrez,et al.  Negative Correlation Ensemble Learning for Ordinal Regression , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[54]  Bo Meng,et al.  A new modeling method based on bagging ELM for day-ahead electricity price prediction , 2010, 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA).

[55]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[56]  Mikio L. Braun,et al.  Fast cross-validation via sequential testing , 2012, J. Mach. Learn. Res..

[57]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[58]  Mohammad Saniee Abadeh,et al.  Protein fold recognition using Deep Kernelized Extreme Learning Machine and linear discriminant analysis , 2018, Neural Computing and Applications.

[59]  Paolo Gastaldo,et al.  Bayesian network based extreme learning machine for subjectivity detection , 2017, J. Frankl. Inst..

[60]  Chunyan Feng,et al.  Network embedding based on deep extreme learning machine , 2018, Int. J. Mach. Learn. Cybern..

[61]  Zhijing Yang,et al.  Local Block Multilayer Sparse Extreme Learning Machine for Effective Feature Extraction and Classification of Hyperspectral Images , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[62]  Wei Zhang,et al.  An improved kernel-based incremental extreme learning machine with fixed budget for nonstationary time series prediction , 2017, Neural Computing and Applications.

[63]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[64]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[65]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[66]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[67]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[68]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[69]  Luiz Eduardo Soares de Oliveira,et al.  The implication of data diversity for a classifier-free ensemble selection in random subspaces , 2008, 2008 19th International Conference on Pattern Recognition.

[70]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[71]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[72]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[73]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[74]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..