Residuals and influence in the multivariate linear model

Regression diagnostics for the multivariate linear model are developed along the lines of the theory for the linear model with univariate response. Internally and externally studentised forms of residuals are given and their distributions found. Distance measures suitable for the assessment of the influence of particular cases on the estimated regression coefficients are considered. The examination of residuals and influence statistics is of great importance in assessing a regression model. Cook & Weisberg (1982) provide an extensive discussion of relevant methods for the linear model with a single response variable. The purpose of the present note is to apply these ideas to the multivariate linear regression problem. Although ordinary least squares estimates of regression coefficients are the same in the multivariate and univariate analyses, there are obvious reasons for carrying out the multivariate analysis to consider simultaneously the different re- sponse variables. One is the possibility that the residual for one response variable in a particular case may not seem to be out of the ordinary in relation to other residuals for that response, but only in relation to the residuals for other responses on the same case. Another is that we may be interested in specifically multivariate aspects of the data. For example, in the problem that prompted this investigation, the main item of interest was the matrix of inter-correlations between five indicators of pollution from sampling stations in the Aegean Sea. This was calculated as the matrix of correlations between the residuals from the regressions of the indicators on covariates including temperature and pH of the seawater. Correlations are particularly vulnerable to distortion by outlying values (Gnanadesikan & Kettenring, 1972), so examination of the multivariate residuals to protect against this was essential. In the following two sections, multivariate residuals and influence measures are presented. Section 4 outlines an application to illustrate the usefulness of the method-