The "weight smoothing" regularization of MLP for Jacobian stabilization

In an approximation problem with a neural network, a low-output root mean square (rms) error is not always a universal criterion. In this paper, we investigate problems where the Jacobians--first derivative of an output value with respect to an input value--of the approximation model are needed and propose to add a quality criterion on these Jacobians during the learning step. More specifically, we focus here on the approximation of functionals A; from a space of continuous functions (discretized in pratice) to a scalar space. In this case, the approximation is confronted with the compensation phenomenon: a lower contribution of one input can be compensated by a larger one of its neighboring inputs. In this case, profiles (with respect to the input index) of neural Jacobians are very irregular instead of smooth. Then, the approximation of A becomes an ill-posed problem because many solutions can be chosen by the learning process. We propose to introduce the smoothness of Jacobian profiles as an a priori information via a regularization technique and develop a new and efficient learning algorithm, called "weight smoothing." We assess the robustness of the weight smoothing algorithm by testing it on a real and complex problem stemming from meteorology: the neural approximation of the forward model of radiative transfer equation in the atmosphere. The stabilized Jacobians of this model are then used in an inversion process to illustrate the improvement of the Jacobians after weight smoothing.

[1]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[2]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[3]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[4]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[5]  Jun-Ho Oh,et al.  Hybrid Learning of Mapping and its Jacobian in Multilayer Neural Networks , 1996, Neural Computation.

[6]  C. Rodgers,et al.  Retrieval of atmospheric temperature and composition from remote measurements of thermal radiation , 1976 .

[7]  David L. Phillips,et al.  A Technique for the Numerical Solution of Certain Integral Equations of the First Kind , 1962, JACM.

[8]  Alexander Linden,et al.  Inversion of neural networks by gradient descent , 1990, Parallel Comput..

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Christopher M. Bishop,et al.  Curvature-driven smoothing: a learning algorithm for feedforward networks , 1993, IEEE Trans. Neural Networks.

[13]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[14]  Alain Chedin,et al.  Réseaux de neurones multi-couches pour la restitution de variables thermodynamiques atmosphériques à l'aide de sondeurs verticaux satellitaires , 1993 .

[15]  A. Chedin,et al.  The Improved Initialization Inversion Method: A High Resolution Physical Method for Temperature Retrievals from Satellites of the TIROS-N Series. , 1985 .

[16]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[17]  G. Wahba,et al.  Generalized Inverses in Reproducing Kernel Spaces: An Approach to Regularization of Linear Operator Equations , 1974 .

[18]  A. Chedin,et al.  A Fast Line-by-Line Method for Atmospheric Absorption Computations: The Automatized Atmospheric Absorption Atlas , 1981 .