Least Squares Routines to Supplement Those of Gentleman

Gentleman (1974) gave some routines (algorithm AS 75) in Algol for linear least squares calculations; Griffiths and Hill (1985) contains a translation into Fortran. They do not include routines for changing the order of variables, for handling singularities, for calculating correlations or partial correlations from the Cholesky factorization, for calculating regression coefficients for a subset of the variables or for calculating the estimated covariances of such regression coefficients. Clarke (1981) provided an algorithm for changing the order of variables, but it requires the elements of the Cholesky factorization to be stored in a different order from Gentleman's, and it overwrites the implicit ls on the diagonal of the upper triangular factor with the row multipliers. The Cholesky factor is represented here as D112R where D is a diagonal matrix of row multipliers and R is an upper triangular matrix with ls on its diagonal. In the Fortran code, the diagonal of D is stored in arrayD and R is stored by rows, excluding its diagonal elements, in the one-dimensional array RBAR. This is exactly the same as in algorithm AS 75. This set of routines can be used with orthogonal reductions produced by using Gentleman's routines, or on their own. The basic algorithm used for updating or changing the order of variables is that used by Gentleman, though the present author has used the so-called fast planar rotation method described by Hammarling (1974) for many years. The Hammarling algorithm is faster for large numbers of variables, but its diagonal multipliers need fairly frequent rescaling, and this produces longer code and partially destroys the speed advantage. For compatibility with algorithm AS 75, the Hammarling algorithm has not been used here. The algorithm used by Gentleman is not well suited when there are more variables than observations. For instance, in near infra-red spectroscopy it is common to have say 500 variables (wavelengths) but only say 30 cases. Algorithm AS 75 requires the storage of the whole upper triangle of 124 750 elements, rather than just the first 30 rows of non-zero values which requires about 15 000 storage locations. If the user is confident that there are no linear dependencies among any 30 of the variables, then