Second-Order Trust-Region Optimization for Data-Limited Inference

Neural networks generally require large amounts of data to adequately model the domain space. In situations where the data are limited, the predictions from these models, which are typically obtained from stochastic gradient descent (SGD) minimization algorithms, can be poor. In these cases, the use of more sophisticated optimization approaches becomes even more crucial to increase the impact of each training iteration. In this paper, we propose an optimization algorithm that uses second-derivative information that exploits curvature information for avoiding saddle points, which can result in limitations on the learning process. In particular, we utilize a Hessian-free approach where we do not explicitly store the second-derivative matrix; rather, we apply a conjugate gradient method and require the Hessian matrix only to compute matrix-vector products. Our approach is based on trust-region methods, which do not require the Hessian to be positive definite and which differentiates our approach from existing Hessian-free methods. We present numerical experiments which demonstrate the improvement in classification accuracy using our proposed approach over a standard SGD approach.