Effective Parallel Computing via a Free Stale Synchronous Parallel Strategy

As the data becomes bigger and more complex, people tend to process it in a distributed system implemented on clusters. Due to the power consumption, cost, and differentiated price-performance, the clusters are evolving into the system with heterogeneous hardware leading to the performance difference among the nodes. Even in a homogeneous cluster, the performance of the nodes is different due to the resource competition and the communication cost. Some nodes with poor performance will drag down the efficiency of the whole system. Existing parallel computing strategies such as bulk synchronous parallel strategy and stale synchronous parallel strategy are not well suited to this problem. To address it, we proposed a free stale synchronous parallel (FSSP) strategy to free the system from the negative impact of those nodes. FSSP is improved from stale synchronous parallel (SSP) strategy, which can effectively and accurately figure out the slow nodes and eliminate the negative effects of those nodes. We validated the performance of the FSSP strategy by using some classical machine learning algorithms and datasets. Our experimental results demonstrated that FSSP was 1.5- $12\times $ faster than the bulk synchronous parallel strategy and stale synchronous parallel strategy, and it used $4\times $ fewer iterations than the asynchronous parallel strategy to converge.

[1]  Yue Dong,et al.  Novel optimized link state routing protocol based on quantum genetic strategy for mobile learning , 2018, J. Netw. Comput. Appl..

[2]  Eric P. Xing,et al.  Addressing the straggler problem for iterative convergent parallel ML , 2016, SoCC.

[3]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[4]  Yue Zhao,et al.  L-PowerGraph: a lightweight distributed graph-parallel communication mechanism , 2018, The Journal of Supercomputing.

[5]  De-gan Zhang A new approach and system for attentive mobile learning based on seamless migration , 2010, Applied Intelligence.

[6]  Mohamed F. Mokbel,et al.  ST-Hadoop: a MapReduce framework for spatio-temporal data , 2017, GeoInformatica.

[7]  Chen Chen,et al.  New Method of Energy Efficient Subcarrier Allocation Based on Evolutionary Game Theory , 2018, Mob. Networks Appl..

[8]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[9]  De-gan Zhang,et al.  Novel approach of distributed & adaptive trust metrics for MANET , 2019, Wirel. Networks.

[10]  Xiang Wang,et al.  A Novel Approach to Mapped Correlation of ID for RFID Anti-Collision , 2014, IEEE Transactions on Services Computing.

[11]  Guoqiang Mao,et al.  New Multi-Hop Clustering Algorithm for Vehicular Ad Hoc Networks , 2019, IEEE Transactions on Intelligent Transportation Systems.

[12]  Xiang Wang,et al.  Novel Quick Start (QS) method for optimization of TCP , 2016, Wirel. Networks.

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Ripon Patgiri,et al.  Taxonomy of Big Data: A Survey , 2018, ArXiv.

[15]  Yue Dong,et al.  A kind of effective data aggregating method based on compressive sensing for wireless sensor network , 2018, EURASIP Journal on Wireless Communications and Networking.

[16]  Xiao-huan Liu,et al.  Dynamic Analysis for the Average Shortest Path Length of Mobile Ad Hoc Networks Under Random Failure Scenarios , 2019, IEEE Access.

[17]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[18]  Xiang Wang,et al.  A novel multicast routing method with minimum transmission for WSN of cloud computing service , 2015, Soft Comput..

[19]  T. N. Vijaykumar,et al.  Tarazu: optimizing MapReduce on heterogeneous clusters , 2012, ASPLOS XVII.

[20]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[21]  Xuesong Yan,et al.  An efficient iterative graph data processing framework based on bulk synchronous parallel model , 2020, Concurr. Comput. Pract. Exp..

[22]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[23]  Francisco Facchinei,et al.  Asynchronous parallel nonconvex large-scale optimization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  De-gan Zhang,et al.  New Medical Image Fusion Approach with Coding Based on SCD in Wireless Sensor Network , 2015 .

[25]  Young-Sik Jeong,et al.  Investigating Apache Hama: a bulk synchronous parallel computing framework , 2017, The Journal of Supercomputing.

[26]  Eric P. Xing,et al.  Managed communication and consistency for fast data-parallel iterative analytics , 2015, SoCC.

[27]  Ting Zhang,et al.  Novel unequal clustering routing protocol considering energy balancing based on network partition & distance for mobile education , 2017, J. Netw. Comput. Appl..

[28]  Li Zhou,et al.  An Adaptive Synchronous Parallel Strategy for Distributed Machine Learning , 2018, IEEE Access.

[29]  Yue Zhao,et al.  LightGraph: Lighten Communication in Distributed Graph-Parallel Processing , 2014, 2014 IEEE International Congress on Big Data.

[30]  Guang Li,et al.  An Energy-Balanced Routing Method Based on Forward-Aware Factor for Wireless Sensor Networks , 2014, IEEE Transactions on Industrial Informatics.

[31]  Bin Cui,et al.  Tornado: A System For Real-Time Iterative Analysis Over Evolving Data , 2016, SIGMOD Conference.

[32]  Si Liu,et al.  Novel PEECR-based clustering routing approach , 2017, Soft Comput..

[33]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.

[34]  Xiao-dan Zhang,et al.  Design and implementation of embedded un-interruptible power supply system (EUPSS) for web-based mobile application , 2012, Enterp. Inf. Syst..

[35]  Sherif Sakr,et al.  The family of mapreduce and large-scale data processing systems , 2013, CSUR.

[36]  Inderjit S. Dhillon,et al.  A Scalable Asynchronous Distributed Algorithm for Topic Modeling , 2014, WWW.

[37]  Wenbo Dai,et al.  A new constructing approach for a weighted topology of wireless sensor networks based on local-world theory for the Internet of Things (IOT) , 2012, Comput. Math. Appl..

[38]  Ting Zhang,et al.  Novel self-adaptive routing service algorithm for application in VANET , 2018, Applied Intelligence.

[39]  De-gan Zhang,et al.  A Low Duty Cycle Efficient MAC Protocol Based on Self-Adaption and Predictive Strategy , 2018, Mob. Networks Appl..

[40]  Nor Badrul Anuar,et al.  MapReduce scheduling algorithms: a review , 2018, The Journal of Supercomputing.

[41]  Bofeng Zhang,et al.  A Free Stale Synchronous Parallel Strategy for Distributed Machine Learning , 2019, Proceedings of the 2019 International Conference on Big Data Engineering.

[42]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[43]  Madhusudhan Govindaraju,et al.  MARLA: MapReduce for Heterogeneous Clusters , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[44]  Xiaobo Zhou,et al.  Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters , 2018, Middleware.

[45]  Alexey L. Lastovetsky,et al.  A Survey of Communication Performance Models for High-Performance Computing , 2019, ACM Comput. Surv..

[46]  Xiaodan Zhang,et al.  A Kind of Novel Method of Power Allocation With Limited Cross-Tier Interference for CRN , 2019, IEEE Access.

[47]  Jiawei Jiang,et al.  Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.

[48]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[49]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[50]  Ting Zhang,et al.  Novel dynamic source routing protocol (DSR) based on genetic algorithm‐bacterial foraging optimization (GA‐BFO) , 2018, Int. J. Commun. Syst..