Identifying outliers in correlated water quality data

Evaluating water quality data for outliers is a good quality control/quality assessment procedure whether the data are used for monitoring or for modeling. Often water quality data are correlated, e.g., carbonaceous biochemical oxygen demand (CBOD) has some correlation with N H3 . Univariate methods for identifying outliers do not consider the correlation between variables and may identify too many data points as outliers or miss observations which have extreme ratios between variables, e.g., a raw wastewater sample with relatively low CBOD but high N H3 . Testing for outliers using multivariate methods such as the Mahalanobis distance, Jackknife distance, p -values, or Hadi’s automatically incorporates the correlation or covariance between variables and is fundamentally more correct. Such multivariate methods can better identify potential outliers and avoid eliminating valid data.