Distributed Averaging CNN-ELM for Big Data

Increasing the scalability of machine learning to handle big volume of data is a challenging task. The scale up approach has some limitations. In this paper, we proposed a scale out approach for CNN-ELM based on MapReduce on classifier level. Map process is the CNN-ELM training for certain partition of data. It involves many CNN-ELM models that can be trained asynchronously. Reduce process is the averaging of all CNN-ELM weights as final training result. This approach can save a lot of training time than single CNN-ELM models trained alone. This approach also increased the scalability of machine learning by combining scale out and scale up approaches. We verified our method in extended MNIST data set and not-MNIST data set experiment. However, it has some drawbacks by additional iteration learning parameters that need to be carefully taken and training data distribution that need to be carefully selected. Further researches to use more complex image data set are required.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Zhiqiong Wang,et al.  Elastic extreme learning machine for big data classification , 2015, Neurocomputing.

[4]  Yann LeCun,et al.  Effiicient BackProp , 1996, Neural Networks: Tricks of the Trade.

[5]  Sven Behnke,et al.  Accelerating Large-Scale Convolutional Neural Networks with Parallel Graphics Multiprocessors , 2010, ICANN.

[6]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[7]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[8]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[12]  Guang-Bin Huang,et al.  Trends in extreme learning machines: A review , 2015, Neural Networks.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Philip Heng Wai Leong,et al.  Map-reduce as a Programming Model for Custom Computing Machines , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[15]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[16]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[17]  Fuzhen Zhuang,et al.  Parallel extreme learning machine for regression based on MapReduce , 2013, Neurocomputing.

[18]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[19]  Maozhen Li,et al.  Parallelizing Convolutional Neural Networks for Action Event Recognition in Surveillance Videos , 2017, International Journal of Parallel Programming.

[20]  Chi-Man Vong,et al.  Local Receptive Fields Based Extreme Learning Machine , 2015, IEEE Computational Intelligence Magazine.

[21]  Shifei Ding,et al.  A hybrid deep learning CNN-ELM model and its application in handwritten numeral recognition , 2015 .

[22]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[23]  Xinyi Yang,et al.  Deep Convolutional Extreme Learning Machine and Its Application in Handwritten Digit Classification , 2016, Comput. Intell. Neurosci..