Physical Gradients for Deep Learning

Solving inverse problems, such as parameter estimation and optimal control, is a vital part of science. Many experiments repeatedly collect data and employ machine learning algorithms to quickly infer solutions to the associated inverse problems. We find that state-of-the-art training techniques are not well-suited to many problems that involve physical processes since the magnitude and direction of the gradients can vary strongly. We propose a novel hybrid training approach that combines higher-order optimization methods with machine learning techniques. We replace the gradient of the physical process by a new construct, referred to as the physical gradient. This also allows us to introduce domain knowledge into training by incorporating priors about the solution space into the gradients. We demonstrate the capabilities of our method on a variety of canonical physical systems, showing that physical gradients yield significant improvements on a wide range of optimization and learning problems.

[1]  H. B. Curry The method of steepest descent for non-linear minimization problems , 1944 .

[2]  Ronald Fedkiw,et al.  An Unconditionally Stable MacCormack Method , 2008, J. Sci. Comput..

[3]  M. Grayson The heat equation shrinks embedded plane curves to round points , 1987 .

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  F. Harlow,et al.  Numerical Calculation of Time‐Dependent Viscous Incompressible Flow of Fluid with Free Surface , 1965 .

[6]  Vladlen Koltun,et al.  Learning to Control PDEs with Differentiable Physics , 2020, ICLR.

[7]  Daniel George,et al.  Deep Learning for Real-time Gravitational Wave Detection and Parameter Estimation with Advanced LIGO Data , 2017, ArXiv.

[8]  C. A. Ur,et al.  Pulse shape discrimination for Gerda Phase I data , 2013, 1307.2610.

[9]  C. Licciardi,et al.  Deep neural networks for energy and position reconstruction in EXO-200 , 2018, Journal of Instrumentation.

[10]  F. Dyson,et al.  A Determination of the Deflection of Light by the Sun's Gravitational Field, from Observations Made at the Total Eclipse of May 29, 1919 , 1920 .

[11]  Marcel Bauer,et al.  Numerical Methods for Partial Differential Equations , 1994 .

[12]  Nicholas I. M. Gould,et al.  Convergence of quasi-Newton matrices generated by the symmetric rank one update , 1991, Math. Program..

[13]  Albert Tarantola,et al.  Inverse problem theory - and methods for model parameter estimation , 2004 .

[14]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[15]  J. J. Moré,et al.  Levenberg--Marquardt algorithm: implementation and theory , 1977 .

[16]  Nikolaos D. Katopodes Marker and Cell Method , 2019, Free-Surface Flow.

[17]  Markus H. Gross,et al.  Deep Fluids: A Generative Network for Parameterized Fluid Simulations , 2018, Comput. Graph. Forum.

[18]  Amir Farbin,et al.  Calorimetry with deep learning: particle simulation and reconstruction for collider physics , 2019, The European Physical Journal C.

[19]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[20]  D. A. Humphreys,et al.  Summary of the FESAC Transformative Enabling Capabilities Panel Report , 2019, Fusion Science and Technology.

[21]  CMS Collaboration A deep neural network to search for new long-lived particles decaying to jets , 2020, Mach. Learn. Sci. Technol..

[22]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[23]  Alexander Gluhovsky,et al.  The structure of energy conserving low-order models , 1999 .

[24]  George E. Karniadakis,et al.  Hidden Fluid Mechanics: A Navier-Stokes Informed Deep Learning Framework for Assimilating Flow Visualization Data , 2018, ArXiv.

[25]  Francis L. Merat,et al.  Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[26]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[27]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[28]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[29]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[30]  Jerome Droniou,et al.  FINITE VOLUME SCHEMES FOR DIFFUSION EQUATIONS: INTRODUCTION TO AND REVIEW OF MODERN METHODS , 2014, 1407.1567.

[31]  Ali Ramadhan,et al.  Universal Differential Equations for Scientific Machine Learning , 2020, ArXiv.

[32]  J. P. Huchra,et al.  Final Results from the Hubble Space Telescope Key Project to Measure the Hubble Constant , 1998, astro-ph/9801080.

[33]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Bronwyn H Hall,et al.  Estimation and Inference in Nonlinear Structural Models , 1974 .

[36]  Martin White,et al.  Calibrating the baryon oscillation ruler for matter and halos , 2009, 0906.1198.

[37]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[38]  G. V. Kraniotis,et al.  Compact calculation of the perihelion precession of Mercury in general relativity, the cosmological constant and Jacobi's inversion problem , 2003 .

[39]  Daniel George,et al.  Deep Neural Networks to Enable Real-time Multimessenger Astrophysics , 2016, ArXiv.

[40]  Giovanni P. Galdi,et al.  An Introduction to the Mathematical Theory of the Navier-Stokes Equations: Steady-State Problems , 2011 .

[41]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[42]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[43]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[44]  M Busch,et al.  Search for Neutrinoless Double-β Decay in ^{76}Ge with the Majorana Demonstrator. , 2017, Physical review letters.

[45]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[46]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[47]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[48]  P. Gill,et al.  Algorithms for the Solution of the Nonlinear Least-Squares Problem , 1978 .

[49]  Keenan Crane,et al.  Energy-preserving integrators for fluid animation , 2009, ACM Trans. Graph..

[50]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[51]  Naftali Tishby,et al.  Machine learning and the physical sciences , 2019, Reviews of Modern Physics.