Quasi-Newton methods and their application to function minimisation

can in general only be found by an iterative process in which successively better, in some sense, approximations to the solution are computed. Of the methods available most rely on evaluating at each stage of the calculation a set of residuals and from these obtaining a correction to each element of the approximate solution. The most common way of doing this is to take each correction to be a suitable linear combination of the residuals. There is, of course, no reason in principle why more elaborate schemes should not be used but they are difficult both to analyse theoretically and to implement in practice. The minimisation of a function of n variables, for which it is possible to obtain analytic expressions for the n first partial derivatives, is a particular example of this type of problem. Any technique used to solve nonlinear equations may be applied to the expressions for the partial derivatives but, because it is known in this case that the residuals form the gradient of some function, it is possible to introduce refinements into the method of solution to take account of this extra information. Since, in addition, the value of the function itself is known, further refinements are possible. The best-known method of solving a general set of simultaneous nonlinear equations, in which the corrections are computed as linear combinations of the residuals, is the Newton-Raphson method. The principal disadvantage of this method lies in the necessity of evaluating and inverting the Jacobian matrix at each stage of the iteration and so a number of methods have arisen, e.g. [1], [2], [4] and [8] in which the inverse Jacobian matrix is replaced by an approximation which is modified in some simple manner at each iteration. Although each method has its own peculiarities certain properties are common to a large class of these methods, and several of these are discussed here. In particular, if it is known that the functions to be zeroed are the first partial derivatives of a function F, then it is possible, if F is quadratic, to modify the approximating matrix in such a way that F is minimised in a finite number of steps. This method of modification is not unique and leads to a subclass of methods of which one example is the method of Davidon [3] as amended by Fletcher and Powell [4]. Since in the methods under discussion the corrections are computed as linear combinations of the residuals, it is natural to introduce matrix notation. Thus a function fj of the variables X\, x2, • ■ ■, x„, may be regarded as a function of the nth order vector x, and each fj in turn may be treated as the jth element of the nth