Accelerating ELM training over data streams

As a machine learning method, extreme learning machine (ELM) has the characteristics of fast learning speed and high accuracy. With the explosive growth of data volume, running machine learning algorithms on distributed computing platforms is an unstoppable trend. Apache Flink is an open-source stream-based distributed platform for massive data processing with good scalability, high throughput, and fault-tolerant ability. In this paper, we first research the characteristics of ELM and distributed computing platforms, then propose a distributed ELM framework (FL-ELM) which is based on Flink. Then we evaluate this framework with synthetic data on a 5-node distributed cluster. In summary, the advantages of the proposed framework is highlighted as follows: (1) The training speed of FL-ELM is always faster than that in Spark; (2) The scalability of FL-ELM behave better than that in Spark.

[1]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[2]  Zhiqiong Wang,et al.  Elastic extreme learning machine for big data classification , 2015, Neurocomputing.

[3]  Guang-Bin Huang,et al.  Trends in extreme learning machines: A review , 2015, Neural Networks.

[4]  Lei Chen,et al.  Enhanced random search based incremental extreme learning machine , 2008, Neurocomputing.

[5]  Fuzhen Zhuang,et al.  Parallel extreme learning machine for regression based on MapReduce , 2013, Neurocomputing.

[6]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[7]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[8]  Han Zhao,et al.  Extreme learning machine: algorithm, theory and applications , 2013, Artificial Intelligence Review.

[9]  Ye Yuan,et al.  An OS-ELM based distributed ensemble classification framework in P2P networks , 2011, Neurocomputing.

[10]  Marcelo Milrad,et al.  Open Source Initiatives and Frameworks Addressing Distributed Real-Time Data Analytics , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[11]  Min Liu,et al.  A new robust ELM method based on a Bayesian framework with heavy-tailed distribution and weighted likelihood function , 2015, Neurocomputing.

[12]  Zhiqiong Wang,et al.  ELM ∗ : distributed extreme learning machine with MapReduce , 2013, World Wide Web.

[13]  Chao Wang,et al.  Distributed Extreme Learning Machine with kernels based on MapReduce , 2015, Neurocomputing.