Machine-Learning-Based Load Balancing for Community Ice Code Component in CESM

Load balancing scientific codes on massively parallel architectures is becoming an increasingly challenging task. In this paper, we focus on the Community Earth System Model, a widely used climate modeling code. It comprises six components each of which exhibits different scalability patterns. Previously, an analytical performance model has been used to find optimal load-balancing parameter configurations for each component. Nevertheless, for the Community Ice Code component, the analytical performance model is too restrictive to capture its scalability patterns. We therefore developed machine-learning-based load-balancing algorithm. It involves fitting a surrogate model to a small number of load-balancing configurations and their corresponding runtimes. This model is then used to find high-quality parameter configurations. Compared with the current practice of expert-knowledge-based enumeration over feasible configurations, the machine-learning-based load-balancing algorithm requires six times fewer evaluations to find the optimal configuration.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[3]  Laxmikant V. Kalé,et al.  Performance evaluation of adaptive MPI , 2006, PPoPP '06.

[4]  Alex R. Pinto,et al.  A load balancing approach based on a genetic machine learning algorithm , 2005, 19th International Symposium on High Performance Computing Systems and Applications (HPCS'05).

[5]  Shinji Yamashita,et al.  Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment , 2000 .

[6]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[7]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[8]  T. Wilbanks,et al.  Contribution of Working Group II to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change , 2007 .

[9]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[10]  M. Effatparvara,et al.  A Genetic Algorithm for Static Load Balancing in Parallel Heterogeneous Systems , 2014 .

[11]  Y. F. Hu,et al.  Load Balancing for Unstructured Mesh Applications , 1999, Scalable Comput. Pract. Exp..

[12]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[13]  Keshav Pingali,et al.  A load balancing framework for adaptive and asynchronous applications , 2004, IEEE Transactions on Parallel and Distributed Systems.

[14]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[15]  Francisco J. Cazorla,et al.  A dynamic scheduler for balancing HPC applications , 2008, HiPC 2008.

[16]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  Prasanna Balaprakash,et al.  Can search algorithms save large-scale automatic performance tuning? , 2011, ICCS.

[19]  T. Therneau,et al.  An Introduction to Recursive Partitioning Using the RPART Routines , 2015 .

[20]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[21]  K.J. Barker,et al.  An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[24]  Mohammad Sadeq Garshasbi,et al.  A Genetic Algorithm for Static Load Balancing in Parallel Heterogeneous Systems , 2014 .

[25]  Rajkumar Sharma,et al.  Dynamic Load Balancing Algorithm for Heterogeneous Multi-core Processors Cluster , 2014, 2014 Fourth International Conference on Communication Systems and Network Technologies.

[26]  Jizhou Sun,et al.  A load balance service based on probabilistic neural network , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[27]  Tarek Helmy,et al.  Machine Learning-Based Adaptive Load Balancing Framework for Distributed Object Computing , 2006, GPC.

[28]  Shibiao Wan,et al.  B. Support vector machines , 2015 .

[29]  N. Nakicenovic,et al.  Climate change 2007: Mitigation. Contribution of Working Group III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Summary for Policymakers. , 2007 .