A Quasi-Newton Method with No Derivatives

The Davidon formula and others of the "quasi-Newton" class, which are used in the unconstrained minimization of a function f, provide a (generally) convergent sequence of approximations to the Hessian of f. These formulas, however, require the independent calculation of the gradient of f. In this paper, a set of new formulas is derived-using a pre- viously described variational approach-which successively approximates the gradient as well as the Hessian, and uses only function values. These formulas are incorporated into an algorithm which, although still crude, works quite well for various standard test functions. Extensive numerical results are presented. 1. Introduction. The so-called variable-metric method for minimizing functions, which was discovered by Davidon (1) and developed by Fletcher and Powell (2), has been so successful that it has attracted a great deal of interest. Various theoretical studies, as well as new, related algorithms, have appeared in the literature ((3H6), among many others). So far, all but one* of these variants of the DFP (Davidon-Fletcher-Powell) method have required the explicit evaluation, at each step, of the gradient of the function f to be minimized. From these computed gradients, the inverse of the Hessian matrix is gradually constructed, and the Newton formula (which is used to compute the next step direction) becomes gradually more accurate. In a previous publication (7), it was shown how DFP-like formulas could be derived by solving a certain variational problem. In this paper, the same method will be applied to finding quasi-Newton** formulas which do not involve the explicit calculation of gradients. Clearly, since the gradient is needed in the Newton formula, the new algorithm will have to estimate it-as well as the Hessian- in the same way as the inverse Hessian is estimated in the DFP method.*** The basic notation to be used is as follows: f(x) is the function of the variables (X1, X2, ... , xN) in RN which is to be minimized; g and G are the gradient and Hessian of f, respectively. In the course of the work, certain estimates of these quantities will be discussed; these will be denoted by g and G (without bars). Further, H G-1. At certain stages, vectors specifying directions for line searches are introduced; the letter d is used to denote these. When a direction vector d has been normalized (in a sense to be outlined later), the normalized direction is denoted by the letter s. Using a starting point x0 and a unit direction s, a straight line in R, may be expressed para-