Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks

The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.

[1]  Daniel Schulte,et al.  A Multi-TeV linear collider based on CLIC technology : CLIC Conceptual Design Report , 2012 .

[2]  Luke de Oliveira,et al.  Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis , 2017, Computing and Software for Big Science.

[3]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[4]  Michela Paganini,et al.  CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks , 2017, ArXiv.

[5]  Federico Carminati,et al.  Three Dimensional Energy Parametrized Generative Adversarial Networks for Electromagnetic Shower Simulation , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[6]  Federico Carminati,et al.  Data-Parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[7]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[8]  Federico Carminati,et al.  Distributed Training of Generative Adversarial Networks for Fast Detector Simulation , 2018, ISC Workshops.

[9]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[12]  Xin Yuan,et al.  Bandwidth optimal all-reduce algorithms for clusters of workstations , 2009, J. Parallel Distributed Comput..

[13]  S. Vallecorsa,et al.  Generative models for fast simulation , 2018, Journal of Physics: Conference Series.

[14]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[15]  Thorsten Kurth,et al.  TensorFlow at Scale: Performance and productivity analysis of distributed training with Horovod, MLSL, and Cray PE ML , 2018, Concurr. Comput. Pract. Exp..