A Family of Gradient Methods for Optimization

The approaches to the problem of maximising or minimising a nonlinear function of several variables can be put into three categories according to how much information is extracted from the function. 1. Only the function value. The more well-known methods in this category are those of Nelder and Mead (1965), Powell (1965), Rosenbrock (1960), and Swann (1964). 2. The gradient. The main technique in this category is the method of steepest ascent, also known as the 'gradient method'. 3. The matrix of second derivatives. This matrix may be computed directly, as in the method of successive approximations or Newton-Raphson method, or may be approximated as in the methods of Barnes (1965), Fletcher and Powell (1963), Fletcher and Reeves (1964), and Powell (1965). The advantages and disadvantages of these methods have been discussed in detail elsewhere (for example, Box, 1966, Wilde and Beightler, 1967). The methods in the first category are especially useful when computed values are subject to error since the function value generally is less unstable than derivatives of the first or higher order. They are also essential where computation of values other than the function value is impractical or impossible. Otherwise, the slow convergence and sensitivity to sudden changes in slope of the surface make these methods generally inferior to those in the other two categories. For the majority of unconstrained optimization problems, the methods in category three will be easily superior to the gradient methods. Their convergence for nearly quadratic functions is very rapid and they provide valuable information about the curvature at the optimum. Even for severely non-quadratic functions, the availability of good initial approximations to the solution can often insure that these methods will come up with the final solution quickly. This paper is concerned with those functions which may cause difficulties in the use of second derivative approaches. The presence of sharp ridges may make round-off error in the computer a'severe problem with these methods and sudden changes in direction of the ridge may make their convergence slow since the ridges of quadratic functions, for which these methods are usually convergent, are straight. On the other hand, the function might be well behaved but there may be simply too many variables to allow convenient storage of the matrix of second derivatives. Finally, there is the large class of problems for which constraints on the variables exist. Methods which rely on finding the optimum along a specified line in parameter space will tend to strike the boundaries of the admissible region more often than those which take small steps. Transformations of the kind suggested by Box (1966) imply that once a constraint has been tightened in this way, it cannot be loosened again except by some form of external intervention. For these problems, the gradient method is very useful and references on nonlinear programming treat it as a widely applicable method for problems of this type (Hadley, 1964). This paper defines a class of methods which fall into the second category. They do not attempt to compute or approximate second derivatives and do not require the location of an optimum on a line. They contain the gradient method as a special case and it will be shown that in general the gradient method is not an efficient member of this class. The important advantages which some other members in this class possess will be outlined in the next section. Section 3 discusses some convergence accelerating possibilities. Section 4 provides an example of a function for which these methods are particularly appropriate.