Accelerating Hyperparameter Optimisation with PyCOMPSs

Machine Learning applications now span across multiple domains due to the increase in computational power of modern systems. There has been a recent surge in Machine Learning applications in High Performance Computing (HPC) in an attempt to speed up training. However, besides training, hyperparameters optimisation(HPO) is one of the most time consuming and resource intensive parts in a Machine Learning Workflow. Numerous algorithms and tools exist to accelerate the process of finding the right parameters for a model. Most of these tools do not utilize the parallelism provided by modern systems and are serial or limited to a single node. The few ones that are offer distributed execution require a serious amount of programming effort. There is, therefore, a need for a tool/scheme that can scale and leverage HPC infrastructures such as supercomputers, with minimum programmers effort and little or no overhead in performance. We present a HPO scheme built on top of PyCOMPSs, a programming model and runtime which aims to ease the development of parallel applications for distributed infrastructures. We show that PyCOMPSs is a powerful framework that can accelerate the process of Hyperparameter Optimisation across multiple devices and computing units. We also show that PyCOMPSs provides easy programmability, seamless distribution and scalability, key features missing in existing tools. Furthermore, we perform a detailed performance analysis showing different configurations to demonstrate the effectiveness our approach.

[1]  Will Song,et al.  End-to-End Deep Neural Network for Automatic Speech Recognition , 2015 .

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  Francisco Madrigal,et al.  Hyper-parameter optimization tools comparison for multiple object tracking applications , 2018, Machine Vision and Applications.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[8]  Ion Stoica,et al.  Tune: A Research Platform for Distributed Model Selection and Training , 2018, ArXiv.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Douglas Thain,et al.  SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Lars Hertel,et al.  Sherpa: Hyperparameter Optimization for Machine Learning Models , 2018 .

[12]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[15]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[16]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[17]  A. Azzouz 2011 , 2020, City.

[18]  Jordi Torres,et al.  PyCOMPSs: Parallel computational workflows in Python , 2016, Int. J. High Perform. Comput. Appl..

[19]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[20]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[21]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.