Efficient Batch Parallel Online Sequential Extreme Learning Machine Algorithm Based on MapReduce

With the development of technology and the widespread use of machine learning, more and more models need to be trained to mine useful knowledge from large scale data. It has become a challenging problem to train multiple models accurately and efficiently so as to make full use of limited computing resources. As one of ELM variants, online sequential extreme learning machine (OS-ELM) provides a method to learn from incremental data. MapReduce, which provides a simple, scalable and fault-tolerant framework, can be utilized for large scale learning. In this paper, we propose an efficient batch parallel online sequential extreme learning machine (BPOS-ELM) algorithm for the training of multiple models. BPOS-ELM estimates the Map execution time and Reduce execution time with historical statistics and generates execution plan. BPOS-ELM launches one MapReduce job to train multiple OS-ELM models according to the generated execution plan. BPOS-ELM is evaluated with real and synthetic data. The accuracy of BPOS-ELM is at the same level as those of OS-ELM and POS-ELM. The speedup of BPOS-ELM reaches 10 on a cluster with maximum 32 cores.