Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation

The convergence of HPC and data intensive methodologies provide a promising approach to major performance improvements. This paper provides a general description of the interaction between traditional HPC and ML approaches and motivates the "Learning Everywhere" paradigm for HPC. We introduce the concept of "effective performance" that one can achieve by combining learning methodologies with simulation based approaches, and distinguish between traditional performance as measured by benchmark scores. To support the promise of integrating HPC and learning methods, this paper examines specific examples and opportunities across a series of domains. It concludes with a series of open software systems, methods and infrastructure challenges that the Learning Everywhere paradigm presents.

[1]  Alexander D. MacKerell,et al.  Current status of protein force fields for molecular dynamics simulations. , 2015, Methods in molecular biology.

[2]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[3]  Pierre Gentine,et al.  Deep learning to represent subgrid processes in climate models , 2018, Proceedings of the National Academy of Sciences.

[4]  Frank Noé,et al.  Machine Learning of Coarse-Grained Molecular Dynamics Force Fields , 2018, ACS central science.

[5]  Rajmonda Sulo Caceres,et al.  The Impact of Structural Changes on Predictions of Diffusion in Networks , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[6]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yiming Yang,et al.  Deep Learning for Epidemiological Predictions , 2018, SIGIR.

[8]  Gianni De Fabritiis,et al.  Simulations meet machine learning in structural biology. , 2018, Current opinion in structural biology.

[9]  Samuel S. Schoenholz,et al.  Combining Machine Learning and Physics to Understand Glassy Systems , 2017, Journal of Physics: Conference Series.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  F. Nédélec,et al.  Collective Langevin dynamics of flexible cytoskeletal fibers , 2007, 0903.5178.

[12]  Paris Perdikaris,et al.  Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations , 2017, ArXiv.

[13]  Lixin Gao,et al.  Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[14]  Juan J de Pablo,et al.  Adaptive enhanced sampling by force-biasing using neural networks. , 2018, The Journal of chemical physics.

[15]  J. Glazier,et al.  Modeling of xenobiotic transport and metabolism in virtual hepatic lobule models , 2018, PloS one.

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Andrew L. Ferguson,et al.  Machine learning and data science in soft materials engineering , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[18]  James R. Larus,et al.  Persona: A High-Performance Bioinformatics Framework , 2017, USENIX Annual Technical Conference.

[19]  D. Tieleman,et al.  Perspective on the Martini model. , 2013, Chemical Society reviews.

[20]  Mati Meron,et al.  Ion Distributions near a Liquid-Liquid Interface , 2006, Science.

[21]  R. Melko,et al.  Machine Learning Phases of Strongly Correlated Fermions , 2016, Physical Review X.

[22]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[23]  Abhijin Adiga,et al.  Sensitivity of Diffusion Dynamics to Network Uncertainty , 2013, AAAI.

[24]  Ge Yu,et al.  FSP: towards flexible synchronous parallel framework for expectation-maximization based algorithms on cloud , 2017, SoCC.

[25]  Gregor von Laszewski,et al.  Contributions to High-Performance Big Data Computing , 2019 .

[26]  Jiangzhuo Chen,et al.  DEFSI: Deep Learning Based Epidemic Forecasting with Synthetic Information , 2019, AAAI.

[27]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[28]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[30]  Christian Holm,et al.  An iterative, fast, linear-scaling method for computing induced charges on arbitrary dielectric boundaries. , 2010, The Journal of chemical physics.

[31]  R. Netz,et al.  Dielectric boundary effects on the interaction between planar charged surfaces with counterions only. , 2018, The Journal of chemical physics.

[32]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[33]  Y. Gel,et al.  Influenza Forecasting with Google Flu Trends , 2013, PloS one.

[34]  Paris Perdikaris,et al.  Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations , 2017, ArXiv.

[35]  David Buckeridge,et al.  Estimated epidemiologic parameters and morbidity associated with pandemic H1N1 influenza , 2010, Canadian Medical Association Journal.

[36]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[37]  Shantenu Jha,et al.  Convergence of data generation and analysis in the biomolecular simulation community , 2022 .

[38]  Hans-Jörg Limbach,et al.  ESPResSo - an extensible simulation package for research on soft matter systems , 2006, Comput. Phys. Commun..

[39]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[40]  Alpha A Lee,et al.  The Electrostatic Screening Length in Concentrated Electrolytes Increases with Concentration. , 2016, The journal of physical chemistry letters.

[41]  Yang Qi,et al.  Self-learning Monte Carlo method , 2016, 1610.03137.

[42]  Mehmet Tan,et al.  Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu trend data , 2015, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE).

[43]  Wei Sun,et al.  A machine learning approach to investigate the relationship between shape features and numerically predicted risk of ascending aortic aneurysm , 2017, Biomechanics and Modeling in Mechanobiology.

[44]  Lei Zhang,et al.  Social Media based Simulation Models for Understanding Disease Dynamics , 2018, IJCAI.

[45]  Mauricio Santillana,et al.  Accurate estimation of influenza epidemics using Google search data via ARGO , 2015, Proceedings of the National Academy of Sciences.

[46]  Geoffrey C. Fox,et al.  Machine learning for parameter auto-tuning in molecular dynamics simulations: Efficient dynamics of ions near polarizable nanoparticles , 2019, Int. J. High Perform. Comput. Appl..

[47]  Geoffrey C. Fox,et al.  Machine Learning for Auto-tuning of Simulation Parameters in Car-Parrinello Molecular Dynamics , 2019 .

[48]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[49]  Jorge Soberón,et al.  Mechanistic and Correlative Models of Ecological Niches , 2015 .

[50]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[51]  W G Noid,et al.  Perspective: Coarse-grained models for biomolecular systems. , 2013, The Journal of chemical physics.

[52]  Deborah Bard,et al.  Creating Virtual Universes Using Generative Adversarial Networks , 2017, ArXiv.

[53]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[54]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[55]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[56]  Dirk Gillespie,et al.  Computing induced charges in inhomogeneous dielectric media: application in a Monte Carlo simulation of complex ionic systems. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[57]  Prabhat,et al.  CosmoFlow: Using Deep Learning to Learn the Universe at Scale , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[58]  Sharon C Glotzer,et al.  Machine learning for crystal identification and discovery , 2017, 1710.09861.

[59]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[60]  Geoffrey C. Fox,et al.  Big Data, Simulations and HPC Convergence , 2015, WBDB.

[61]  Vikram Jadhao,et al.  Ionic structure in liquids confined by dielectric interfaces. , 2015, The Journal of chemical physics.

[62]  Kipton Barros,et al.  Efficient and accurate simulation of dynamic dielectric objects. , 2014, The Journal of chemical physics.

[63]  Bingjing ZHANG,et al.  Parallelizing Big Data Machine Learning Applications with Model Rotation , 2017 .

[64]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[65]  Geoffrey Fox,et al.  Twister2: Design of a big data toolkit , 2020, Concurr. Comput. Pract. Exp..

[66]  Geoffrey C. Fox,et al.  Machine Learning for Performance Enhancement of Molecular Dynamics Simulations , 2019, ICCS.

[67]  Geoffrey C. Fox,et al.  Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[68]  Alicia Karspeck,et al.  Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics , 2014, PLoS Comput. Biol..

[69]  Stefano Piana,et al.  Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. , 2014, Current opinion in structural biology.

[70]  Rampi Ramprasad,et al.  Adaptive machine learning framework to accelerate ab initio molecular dynamics , 2015 .

[71]  A. Henney,et al.  The virtual liver: a multidisciplinary, multilevel challenge for systems biology , 2012, Wiley interdisciplinary reviews. Systems biology and medicine.

[72]  Vijay V. Raghavan,et al.  A novel data-driven model for real-time influenza forecasting , 2017, bioRxiv.

[73]  G Wayne Brodland,et al.  How computational models can help unlock biological systems. , 2015, Seminars in cell & developmental biology.

[74]  Alexander G. Fletcher,et al.  Comparing individual-based approaches to modelling the self-organization of multicellular tissues , 2016, bioRxiv.

[75]  Judy Qiu,et al.  High-Performance Massive Subgraph Counting using Pipelined Adaptive-Group Communication , 2018, TopHPC.

[76]  Nagiza F. Samatova,et al.  Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[77]  Monica Olvera de la Cruz,et al.  Tunable soft structure in charged fluids confined by dielectric interfaces , 2013, Proceedings of the National Academy of Sciences.

[78]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[79]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[80]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[81]  E. Nsoesie,et al.  A Simulation Optimization Approach to Epidemic Forecasting , 2013, PloS one.

[82]  N. Trayanova Whole-heart modeling: applications to cardiac electrophysiology and electromechanics. , 2011, Circulation research.

[83]  Ellyn Ayton,et al.  Forecasting influenza-like illness dynamics for military populations using neural networks and social media , 2017, PloS one.

[84]  R. Baker,et al.  Mechanistic models versus machine learning, a fight worth fighting for the biological community? , 2018, Biology Letters.

[85]  Michael Gastegger,et al.  Machine learning molecular dynamics for the simulation of infrared spectra† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02267k , 2017, Chemical science.

[86]  Amir Sani,et al.  Agent-Based Model Calibration Using Machine Learning Surrogates , 2017, 1703.10639.

[87]  Judy Qiu,et al.  HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[88]  Geoffrey C. Fox,et al.  NSF 1443054: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science , 2017 .

[89]  Liang Zhao,et al.  SimNest: Social Media Nested Epidemic Simulation via Online Semi-Supervised Deep Learning , 2015, 2015 IEEE International Conference on Data Mining.

[90]  Jieping Ye,et al.  Dynamic Poisson Autoregression for Influenza-Like-Illness Case Count Prediction , 2015, KDD.

[91]  Justin A. Sirignano,et al.  DGM: A deep learning algorithm for solving partial differential equations , 2017, J. Comput. Phys..

[92]  Shantenu Jha,et al.  Adaptive ensemble simulations of biomolecules. , 2018, Current opinion in structural biology.

[93]  Randy Heiland,et al.  PhysiCell: An open source physics-based cell simulator for 3-D multicellular systems , 2017, bioRxiv.

[94]  J. Behler First Principles Neural Network Potentials for Reactive Simulations of Large Molecular and Condensed Systems. , 2017, Angewandte Chemie.

[95]  Ioannis G Kevrekidis,et al.  Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator. , 2017, Chaos.

[96]  Judy Qiu,et al.  Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[97]  Ahmed H. Elsheikh,et al.  A machine learning approach for efficient uncertainty quantification using multiscale methods , 2017, J. Comput. Phys..

[98]  Ioannis G Kevrekidis,et al.  Intrinsic map dynamics exploration for uncharted effective free-energy landscapes , 2016, Proceedings of the National Academy of Sciences.

[99]  Xiao Fu,et al.  A Liver-Centric Multiscale Modeling Framework for Xenobiotics , 2016, PloS one.

[100]  M. Newman Spread of epidemic disease on networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[101]  R. Rigby,et al.  Generalized Autoregressive Moving Average Models , 2003 .

[102]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[103]  David M. Umulis,et al.  Quantitative model analysis with diverse biological data: applications in developmental pattern formation. , 2013, Methods.

[104]  Vittorio Cristini,et al.  Patient-calibrated agent-based modelling of ductal carcinoma in situ (DCIS): from microscopic measurements to macroscopic predictions of clinical progression. , 2012, Journal of theoretical biology.

[105]  Ilias Bilionis,et al.  Deep UQ: Learning deep neural network surrogate models for high dimensional uncertainty quantification , 2018, J. Comput. Phys..

[106]  Roger G. Melko,et al.  Deep Learning the Ising Model Near Criticality , 2017, J. Mach. Learn. Res..