General-Error Regression for Deriving Cost-Estimating Relationships

Abstract Development of cost-estimating relationships (CERs) in most cost models involving historical data is based on explicit solutions of the classical least-squares linear regression equation Y = a + bX + E, where Y is cost, X is the numerical value of a cost driver, E is a Gaussian error term whose variance does not depend on the numerical value of X, and a and b are numerical coefficients derived from the historical data. The coefficients of nonlinear forms such as Y = aXb E are derived by taking logarithms of both sides and reducing the formulation to log(Y) = log(a) + b log(X) + log(E). This approach has a number of well-documented weaknesses in addition to the fact that the error of estimation is expressed in meaningless units (“log dollars”). A second weakness is that the analyst is forced to assume an additive-error (uniform dollar value across the board) model when historical data indicate a linear relationship between cost driver and cost, but a multiplicative-error (a percentage of the estim...