Rational activation functions in neural networks with uniform based loss functions and its application in classification

In this paper, we demonstrate the application of generalised rational uniform (Chebyshev) approximation in neural networks. In particular, our activation functions are one degree rational functions and the loss function is based on the uniform norm. In this setting, when the coefficients of the rational activation function are fixed, the overall optimisation problem of the neural network forms a generalised rational uniform approximation problem where the weights and the bias of the network are the decision variables. To optimise the decision variables, we suggest using two prominent methods: the bisection method and the differential correction algorithm. We perform numerical experiments on classification problems with two classes and report the classification accuracy obtained by the network using the bisection method, differential correction algorithm along with the standard MATLAB toolbox which uses least square loss function. We show that the choice of the uniform norm based loss function with rational activation function and the bisection method lead to better classification accuracy when the training dataset is either very small or if the classes are imbalanced.

[1]  H. L. Loeb,et al.  Two new algorithms for rational approximation , 1961 .

[2]  Kate Smith-Miles,et al.  Efficient Identification of the Pareto Optimal Set , 2014, LION.

[3]  Brejesh Lall,et al.  Learning Activation Functions: A new paradigm of understanding Neural Networks , 2019, ArXiv.

[4]  Juan Enrique Martínez-Legaz,et al.  Quasiconvex duality theory by generalized conjugation methods , 1988 .

[5]  Julien Ugon,et al.  Generalised rational approximation and its application to improve deep learning classifiers , 2021, Appl. Math. Comput..

[6]  Xi Cheng,et al.  Polynomial Regression As an Alternative to Neural Nets , 2018, ArXiv.

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  A. Rubinov Abstract Convexity and Global Optimization , 2000 .

[9]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[10]  Jacques A. Ferland,et al.  A note on an algorithm for generalized fractional programs , 1986 .

[11]  Maziar Raissi,et al.  Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations , 2018, J. Mach. Learn. Res..

[12]  Matus Telgarsky,et al.  Neural Networks and Rational Functions , 2017, ICML.

[13]  M. J. D. Powell,et al.  The differential correction algorithm for rational L∞ approximation , 1971 .

[14]  P. Alam ‘L’ , 2021, Composites Engineering: An A–Z Guide.

[15]  J. O. Lopes,et al.  Algorithms for quasiconvex minimization , 2011 .

[16]  J. Stoer,et al.  Rational Chebyshev approximation , 1967 .

[17]  Sung-Kwun Oh,et al.  Polynomial neural networks architecture: analysis and design , 2003, Comput. Electr. Eng..

[18]  Timothy Marler,et al.  Neural network for regression problems with reduced training sets , 2017, Neural Networks.

[19]  Kristian Kersting,et al.  Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks , 2019, ICLR.

[20]  Lloyd N. Trefethen,et al.  The AAA Algorithm for Rational Approximation , 2016, SIAM J. Sci. Comput..

[21]  N. Sukhorukova,et al.  The extension of linear inequality method for generalised rational Chebyshev approximation to approximation by general quasilinear functions , 2020, 2011.07731.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  W. Fenchel Convex cones, sets, and functions , 1953 .

[24]  Siegfried Schaible,et al.  Handbook of Generalized Convexity and Generalized Monotonicity , 2005 .

[25]  Moshe Shoham,et al.  Approximating Functions by Neural Networks: A Constructive Solution in the Uniform Norm , 1996, Neural Networks.

[26]  L. Sandgren On convex cones , 1954 .

[27]  Feng Liu,et al.  Deep Learning and Its Applications in Biomedicine , 2018, Genom. Proteom. Bioinform..

[28]  Nicolas Hadjisavvas,et al.  An Appropriate Subdifferential for Quasiconvex Functions , 2002, SIAM J. Optim..

[29]  E. Cheney,et al.  A Survey of Methods for Rational Approximation, with Particular Reference to a New Method Based on a Forumla of Darboux , 1963 .

[30]  H. L. Loeb,et al.  GENERALIZED RATIONAL APPROXIMATION , 1964 .

[31]  Y. Nakatsukasa,et al.  Rational neural networks , 2020, NeurIPS.

[32]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[33]  Xuchao Zhang,et al.  Rational Neural Networks for Approximating Graph Convolution Operator on Jump Discontinuities , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[34]  H. L. Loeb,et al.  Algorithms for Chebyshev Approximations Using the Ratio of Linear Forms , 1960 .

[35]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[36]  B. Simsek,et al.  Conjugate quasiconvex nonnegative functions , 1995 .

[37]  Manuela Veloso,et al.  Learning from accelerometer data on a legged robot , 2004 .

[38]  Vera Roshchina,et al.  Deep Learning with Nonsmooth Objectives , 2021, ArXiv.

[39]  R. D'iaz Mill'an,et al.  Multivariate approximation by polynomial and generalized rational functions , 2021, Optimization.

[40]  Kristian Kersting,et al.  Recurrent Rational Networks , 2021, ArXiv.

[41]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.