Stream-Based Extreme Learning Machine Approach for Big Data Problems

Big Data problems demand data models with abilities to handle time-varying, massive, and high dimensional data. In this context, Active Learning emerges as an attractive technique for the development of high performance models using few data. The importance of Active Learning for Big Data becomes more evident when labeling cost is high and data is presented to the learner via data streams. This paper presents a novel Active Learning method based on Extreme Learning Machines (ELMs) and Hebbian Learning. Linearization of input data by a large size ELM hidden layer turns our method little sensitive to parameter setting. Overfitting is inherently controlled via the Hebbian Learning crosstalk term. We also demonstrate that a simple convergence test can be used as an effective labeling criterion since it points out to the amount of labels necessary for learning. The proposed method has inherent properties that make it highly attractive to handle Big Data: incremental learning via data streams, elimination of redundant patterns, and learning from a reduced informative training set. Experimental results have shown that our method is competitive with some large-margin Active Learning strategies and also with a linear SVM.

[1]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[2]  E. Gardner,et al.  Maximum Storage Capacity in Neural Networks , 1987 .

[3]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[4]  Foster J. Provost,et al.  Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce , 2007, ICEC.

[5]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[6]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[7]  Anant Madabhushi,et al.  An active learning based classification strategy for the minority class problem: application to histopathology annotation , 2011, BMC Bioinformatics.

[8]  Benoît Frénay,et al.  Using SVMs with randomised feature spaces: an extreme learning approach , 2010, ESANN.

[9]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[10]  P. Peretto,et al.  On learning rules and memory storage abilities of asymmetrical neural networks , 1988 .

[11]  Claire Monteleoni,et al.  Practical Online Active Learning for Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[13]  Jeff A. Bilmes,et al.  Active Learning as Non-Convex Optimization , 2009, AISTATS.

[14]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[15]  Antônio de Pádua Braga,et al.  An Extreme Learning Approach to Active Learning , 2014, ESANN.

[16]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[18]  Claudio Gentile,et al.  Worst-Case Analysis of Selective Sampling for Linear Classification , 2006, J. Mach. Learn. Res..

[19]  Benoît Frénay,et al.  Parameter-insensitive kernel in extreme learning for non-linear support vector regression , 2011, Neurocomputing.

[20]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[21]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[24]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[25]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[26]  W. S. Andrus,et al.  Editorial: Radiology and the receiver operating characteristic (ROC) curve. , 1975, Chest.

[27]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[28]  P. Bartlett,et al.  Linear Discriminant and Support Vector Classifiers , 2000 .

[29]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[30]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[31]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[32]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.

[33]  Senén Barro,et al.  Direct Parallel Perceptrons (DPPs): Fast Analytical Calculation of the Parallel Perceptrons Weights With Margin Control for Classification Tasks , 2011, IEEE Transactions on Neural Networks.

[34]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[35]  I. Guyon,et al.  Information storage and retrieval in spin-glass like neural networks , 1985 .

[36]  Santosh S. Venkatesh,et al.  The capacity of the Hopfield associative memory , 1987, IEEE Trans. Inf. Theory.

[37]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[38]  Qing He,et al.  Extreme Support Vector Machine Classifier , 2008, PAKDD.