Auto associative Extreme Learning Machine based non-linear principal component regression for big data applications

In this paper, we propose a hybrid model that combines the Auto Associative Extreme Learning Machine (AAELM) with Multiple Linear Regression (MLR) (AAELM+MLR) for performing big data regression. It works using Hadoop Mapreduce parallel computing model which is implemented in Python using Dumbo API. It works in two phases. In the first phase, three-layered AAELM is trained. The output of the hidden nodes of AAELM is treated as NLPCs. In the second phase, MLR model is fitted using these NLPCs as input variables. Effectiveness of AAELM+MLR model is demonstrated on two large datasets viz., airline flight delay dataset and gas sensor array dataset, taken from the web. It is observed that AAELM+MLR outperformed MLR model by yielding less average mean squared error (MSE) and MAPE values under the 10 fold cross-validation framework. A statistical test confirms its superiority at 1% level of significance.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Korris Fu-Lai Chung,et al.  Positive and negative fuzzy rule system, extreme learning machine and image classification , 2011, Int. J. Mach. Learn. Cybern..

[3]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[4]  James Demmel,et al.  Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures , 2013, 2013 IEEE International Conference on Big Data.

[5]  Sadique Sheik,et al.  Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring , 2015 .

[6]  Jacopo Urbani,et al.  Scalable RDF data compression with MapReduce , 2013, Concurr. Comput. Pract. Exp..

[7]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[8]  Pierre Courrieu,et al.  Fast Computation of Moore-Penrose Inverse Matrices , 2008, ArXiv.

[9]  Guang-Bin Huang,et al.  Trends in extreme learning machines: A review , 2015, Neural Networks.

[10]  Vadlamani Ravi,et al.  Privacy preserving data mining using particle swarm optimisation trained auto-associative neural network: an application to bankruptcy prediction in banks , 2012, Int. J. Data Min. Model. Manag..

[11]  Beng Chin Ooi,et al.  Proceedings of the 2007 ACM SIGMOD international conference on Management of data , 2007, SIGMOD 2007.

[12]  Hongming Zhou,et al.  Optimization method based extreme learning machine for classification , 2010, Neurocomputing.

[13]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Jing Deng,et al.  Application of nonlinear PCA for fault detection in polymer extrusion processes , 2011, Neural Computing and Applications.

[15]  Nicolas Le Bihan,et al.  Quaternion principal component analysis of color images , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[16]  Achim Streit,et al.  MapReduce across Distributed Clusters for Data-intensive Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[17]  Xiaolong Deng,et al.  MapReduce based Betweenness Approximation Engineering in Large Scale Graph , 2012 .

[18]  Giovanni C. Porzio,et al.  Mining performance data through nonlinear PCA with optimal scaling , 2010 .

[19]  Vadlamani Ravi,et al.  Non-linear principal component analysis-based hybrid classifiers: an application to bankruptcy prediction in banks , 2010, Int. J. Inf. Decis. Sci..

[20]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[21]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[22]  Pablo Laguna,et al.  Principal Component Analysis in ECG Signal Processing , 2007, EURASIP J. Adv. Signal Process..

[23]  George W. Irwin,et al.  Improved Nonlinear PCA for Process Monitoring Using Support Vector Data Description , 2011 .

[24]  Q. M. Jonathan Wu,et al.  Human face recognition based on multidimensional PCA and extreme learning machine , 2011, Pattern Recognit..

[25]  Zhiqiong Wang,et al.  Elastic extreme learning machine for big data classification , 2015, Neurocomputing.

[26]  Xin Bi,et al.  XML document classification based on ELM , 2011, Neurocomputing.

[27]  Maozhen Li,et al.  A MapReduce-based distributed SVM algorithm for automatic image annotation , 2011, Comput. Math. Appl..

[28]  Vadlamani Ravi,et al.  Extreme Learning Machine , 2013 .

[29]  Mayank Pandey,et al.  Hybrid classification and regression models via particle swarm optimization auto associative neural network based nonlinear PCA , 2013, Int. J. Hybrid Intell. Syst..

[30]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[31]  Binu P. Chacko,et al.  Handwritten character recognition using wavelet energy and extreme learning machine , 2012, Int. J. Mach. Learn. Cybern..