Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure

Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Charles Blatti,et al.  Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform , 2020, PLoS biology.

[3]  Ian T. Foster,et al.  DLHub: Model and Data Serving for Science , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[4]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.

[5]  Gregory M. Kurtzer,et al.  Singularity 2.1.2 - Linux application and environment containers for science , 2016 .

[6]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[7]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Hongyu Shen,et al.  Deep Learning at Scale for Gravitational Wave Parameter Estimation of Binary Black Hole Mergers , 2019, ArXiv.

[10]  D. Whiteson,et al.  Deep Learning and Its Application to LHC Physics , 2018, Annual Review of Nuclear and Particle Science.

[11]  Julian Kates-Harbeck,et al.  Training distributed deep recurrent neural networks with mixed precision on GPU clusters , 2017, MLHPC@SC.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Franck Cappello,et al.  Big data and extreme-scale computing , 2018, Int. J. High Perform. Comput. Appl..

[14]  Yuanzhou Yang,et al.  Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.

[15]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[16]  E. A. Huerta,et al.  Physics-inspired deep learning to characterize the signal manifold of quasi-circular, spinning, non-precessing binary black hole mergers , 2020, ArXiv.

[17]  Kyle Chard,et al.  A data ecosystem to support machine learning in materials science , 2019, MRS Communications.

[18]  S. Huber,et al.  Learning phase transitions by confusion , 2016, Nature Physics.

[19]  Terrence J Sejnowski,et al.  The unreasonable effectiveness of deep learning in artificial intelligence , 2020, Proceedings of the National Academy of Sciences.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[22]  Rui Liu,et al.  Brown Dog: Leveraging everything towards autocuration , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[23]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Sibo Wang,et al.  Unsupervised learning and data clustering for the construction of Galaxy Catalogs in the Dark Energy Survey , 2018, Physics Letters B.

[26]  William Gropp,et al.  HAL: Computer System for Scalable Deep Learning , 2020, PEARC.

[27]  E. Huerta,et al.  Artificial neural network subgrid models of 2D compressible magnetohydrodynamic turbulence , 2019, Physical Review D.

[28]  Prasanna Balaprakash,et al.  DeepHyper: Asynchronous Hyperparameter Search for Deep Neural Networks , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[29]  Seid Koric,et al.  Machine learning accelerated topology optimization of nonlinear structures , 2020, ArXiv.

[30]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[31]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[32]  Volodymyr Kindratenko,et al.  Review and Examination of Input Feature Preparation Methods and Machine Learning Models for Turbulence Modeling. , 2020, 2001.05485.

[33]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[34]  James Demmel,et al.  ImageNet Training in Minutes , 2017, ICPP.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[37]  Rajeev S. Assary,et al.  Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations , 2019, MRS Communications.

[38]  Yan Zhao,et al.  Clowder: Open Source Data Management for Long Tail Data , 2018, PEARC.

[39]  Hongyu Shen,et al.  Enabling real-time multi-messenger astrophysics discoveries with deep learning , 2019, Nature Reviews Physics.

[40]  Achille Fokoue,et al.  An effective algorithm for hyperparameter optimization of neural networks , 2017, IBM J. Res. Dev..