CIRA Guide to Custom Loss Functions for Neural Networks in Environmental Sciences - Version 1

Neural networks are increasingly used in environmental science applications. Furthermore, neural network models are trained by minimizing a loss function, and it is crucial to choose the loss function very carefully for environmental science applications, as it determines what exactly is being optimized. Standard loss functions do not cover all the needs of the environmental sciences, which makes it important for scientists to be able to develop their own custom loss functions so that they can implement many of the classic performance measures already developed in environmental science, including measures developed for spatial model verification. However, there are very few resources available that cover the basics of custom loss function development comprehensively, and to the best of our knowledge none that focus on the needs of environmental scientists. This document seeks to fill this gap by providing a guide on how to write custom loss functions targeted toward environmental science applications. Topics include the basics of writing custom loss functions, common pitfalls, functions to use in loss functions, examples such as fractions skill score as loss function, how to incorporate physical constraints, discrete and soft discretization, and concepts such as focal, robust, and adaptive loss. While examples are currently provided in this guide for Python with Keras and the TensorFlow backend, the basic concepts also apply to other environments, such as Python with PyTorch. Similarly, while the sample loss functions provided here are from meteorology, these are just examples of how to create custom loss functions. Other fields in the environmental sciences have very similar needs for custom loss functions, e.g., for evaluating spatial forecasts effectively, and the concepts discussed here can be applied there as well. All code samples are provided in a GitHub repository. ∗CIRA: Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, CO. ECE: Electrical and Computer Engineering, Colorado State University, Fort Collins, CO. NOAA-GSL: National Oceanic and Atmospheric Administration (NOAA), Global Systems Laboratory (GSL), Boulder, Colorado. CS: Computer Science, Colorado State University, Fort Collins, CO. CIRES: Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO. ar X iv :2 10 6. 09 75 7v 1 [ cs .L G ] 1 7 Ju n 20 21 CIRA GUIDE TO CUSTOM LOSS FUNCTIONS 1 Why would environmental scientists care about custom loss functions? The use of neural networks in environmental science applications is growing at a rapid pace. In order to train a neural network one has to choose a cost function, called a loss function in the context of neural networks, which represents the error of the neural network. The neural network is then trained, i.e. its parameters are chosen through an iterative process, such that the loss function, and thus the error, is minimized. It is crucial to choose the loss function very carefully for environmental applications, as it determines what exactly the neural network is optimizing. Many pre-defined loss functions exist. The most popular examples for regression (predicting a continuous value such as wind speed) include mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE). The most popular example for classification is cross-entropy. There are many other predefined loss functions, and their number keeps growing, but they do not cover everything environmental scientists care about, since they were developed for other applications. In fact, environmental scientists have a long tradition of developing meaningful performance measures for forecasting tasks, such as for single-category forecasts (accuracy, frequency bias, probability of detection, success ratio, etc.); for multi-category forecasts (Heidke score, Gerrity score, etc.); for continuous forecasts (correlation, reliability diagram, etc.); for probabilistic forecasts (reliability diagram, Brier score, etc.); for spatial forecasts (neighborhood methods, such as fractions skill score, and scale decomposition, such as wavelet decomposition), and many others. For an extremely comprehensive overview of these and other performance measures, see the guide of the WWRP/WGNE Joint Working Group on Forecast Verification Research at https://www.cawcr.gov.au/projects/verification/. However, it is not obvious which ones of those performance measures can easily be used in a neural network and how to do it, for the following reasons: • There are various limitations of what can be implemented in a neural network loss function. Functions must be differentiable and execute extremely quickly, which makes it tricky to implement custom loss functions. • The loss functions required by environmental scientists are unlike any loss functions typically used in computer science, and the community has not yet developed comprehensive resources, such as a large collection of customized loss functions. The above reasons make the topic of loss functions a significant hurdle for practitioners striving to implement meaningful loss functions for their applications. We seek to close this gap here by providing comprehensive instructions, including many examples and discussion of common pitfalls, on how to code custom loss functions. While the sample loss functions provided here are from meteorology, these are just examples of how to create custom loss functions. Other fields in the environmental sciences have very similar needs for custom loss functions, e.g., for evaluating spatial forecasts effectively, and the concepts discussed here can be applied there as well. Making it possible, and even easy, to use a variety of meaningful loss functions will go a long way to more effectively tune neural networks to focus on the types of performance criteria that are truly important in environmental science applications, thus helping the science community to make the most of neural networks for their applications. Scientists in other areas have invested in similar efforts, e.g., researchers in the medical imaging community have developed a collection of loss functions specifically for medical image segmentation [1]. Loss function development for different purposes also remains a very active topic in computer science. See [2, 3, 4, 5] for just a small sample of recent research. 1.1 A case in point: using measures from spatial model verification for neural networks Here we briefly discuss one area with particularly high potential for the development and use of custom loss functions, namely neural networks for spatial forecasts in the environmental sciences. Meteorologists and other environmental scientists have developed an extensive set of evaluation measures for spatial model verification, and ideally those evaluation measures should be used directly for neural network training for forecast models. Why train a neural network on anything other than the criteria we seek to optimize? However, many networks are still trained using pixel-based measures, such as MAE, MSE, or RMSE, mainly because those are easily available as loss functions. Gilleland et al. [6] conducted a large scale comparison of different model verification methods that focus on methods to compare a spatial forecast (image) to an observation (also an image). In [6, 7] they distinguish four primary classes of methods for model verification2: • Neighborhood: Methods that apply some kind of neighborhood averaging to both the forecast and the observation before applying a pixel-based comparison of the resulting smoothed images. Example: Fractions skill score. A later classification by the same group cites five [8], but we prefer the separation into the original four categories described in [6, 7].

[1]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[2]  Eric Gilleland,et al.  Intercomparison of Spatial Forecast Verification Methods , 2009 .

[3]  Djemel Ziou,et al.  Image Quality Metrics: PSNR vs. SSIM , 2010, 2010 20th International Conference on Pattern Recognition.

[4]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[5]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  B. Brown,et al.  The Setup of the MesoVICT Project , 2018, Bulletin of the American Meteorological Society.

[7]  Amy McGovern,et al.  Making the Black Box More Transparent: Understanding the Physical Implications of Machine Learning , 2019, Bulletin of the American Meteorological Society.

[8]  Anuj Karpatne,et al.  CoPhy-PGNN: Learning Physics-guided Neural Networks with Competing Loss Functions for Solving Eigenvalue Problems , 2020, ACM Trans. Intell. Syst. Technol..

[9]  W. Briggs Statistical Methods in the Atmospheric Sciences , 2007 .

[10]  Randal J. Barnes,et al.  Controlled Abstention Neural Networks for Identifying Skillful Predictions for Classification Problems , 2021, Journal of Advances in Modeling Earth Systems.

[11]  Timothy J. Schmit,et al.  A Closer Look at the ABI on the GOES-R Series , 2017 .

[12]  Chunhong Pan,et al.  Enhancing Pix2Pix for Remote Sensing Image Classification , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[13]  B. Brown,et al.  APPLICATION OF THE MODE OBJECT-BASED VERIFICATION TOOL FOR THE EVALUATION OF MODEL PRECIPITATION FIELDS , 1992 .

[14]  A. H. Murphy,et al.  The attributes diagram A geometrical framework for assessing the quality of probability forecasts , 1986 .

[15]  Pierre Baldi,et al.  Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems. , 2019, Physical review letters.

[16]  Fan Yang,et al.  Image matching using structural similarity and geometric constraint approaches on remote sensing images , 2016 .

[17]  Vipin Kumar,et al.  Integrating Physics-Based Modeling with Machine Learning: A Survey , 2020, ArXiv.

[18]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  A. Bovik,et al.  A universal image quality index , 2002, IEEE Signal Processing Letters.

[20]  Imme Ebert-Uphoff,et al.  Applying machine learning methods to detect convection usingGOES-16 ABI data , 2020 .

[21]  N. Roberts,et al.  Scale-Selective Verification of Rainfall Accumulations from High-Resolution Forecasts of Convective Events , 2008 .

[22]  Eric Gilleland,et al.  Verifying Forecasts Spatially , 2010 .

[23]  Randal J. Barnes,et al.  Controlled abstention neural networks for identifying skillful predictions for regression problems , 2021, Journal of Advances in Modeling Earth Systems.

[24]  I. Ebert‐Uphoff,et al.  Using deep learning to nowcast the spatial coverage of convection from Himawari-8 satellite data , 2021, Monthly Weather Review.

[25]  Imme Ebert-Uphoff,et al.  Evaluation, Tuning, and Interpretation of Neural Networks for Working with Images in Meteorological Applications , 2020 .

[26]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[27]  Xiaolin Wu,et al.  Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques , 2020, ArXiv.

[28]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Deepu Rajan,et al.  An image similarity descriptor for classification tasks , 2020, J. Vis. Commun. Image Represent..

[30]  Andrew Glaws,et al.  Adversarial super-resolution of climatological wind and solar data , 2020, Proceedings of the National Academy of Sciences.

[31]  David Hall,et al.  Tropical and Extratropical Cyclone Detection Using Deep Learning , 2020, Journal of Applied Meteorology and Climatology.

[32]  Gregory P. Meyer An Alternative Probabilistic Interpretation of the Huber Loss , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yutaka Satoh,et al.  Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  Richard Dosselmann,et al.  A comprehensive assessment of the structural similarity index , 2011, Signal Image Video Process..

[37]  Anne L. Martel,et al.  Loss odyssey in medical image segmentation , 2021, Medical Image Anal..

[38]  Nagiza F. Samatova,et al.  Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[39]  Kai Yang,et al.  Optimized-SSIM Based Quantization in Optical Remote Sensing Image Compression , 2011, 2011 Sixth International Conference on Image and Graphics.

[40]  Steven D. Miller,et al.  Development and Interpretation of a Neural Network-Based Synthetic Radar Reflectivity Estimator Using GOES-R Satellite Observations , 2020, Journal of Applied Meteorology and Climatology.

[41]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[42]  Imme Ebert-Uphoff,et al.  Using deep learning to emulate and accelerate a radiative-transfer model , 2021, Journal of Atmospheric and Oceanic Technology.