Problems of hypothesis testing of regressions with multiple measurements from individual sampling units

Abstract Problems with hypothesis testing arise when regression analysis is applied to data sets which contain multiple measurements from individual sampling units. These sampling units might be individual trees from each of which several measurements were taken at different positions on each tree, or they might be individual plots in each of which many trees were measured, or they might be individual plots each of which was measured at several ages. The problems arise because application of ordinary least-squares regression to such data sets leads to underestimates of the covariance matrix of the parameter estimates and the residual variance of the regression equation. Thus, it would not be possible to carry out properly the statistical tests involved in matters such as a covariance analysis or the determination of the most appropriate form of the equation to be fitted to a data set. Theory already exists to solve these problems in cases where the measurements are all made at the same set of conditions in each sampling unit: this is often the case with data from designed experiments. Forestry data sets are often not of this nature and these solutions are generally inappropriate. Using a simple example, the present work explains how problems of hypothesis testing of regressions arise with these data sets. Six theoretical attempts to solve the problems are reviewed. All these theories apply only asymptotically, that is when the number of sampling units is very large. Their small sample behaviour is unknown and their usefulness is therefore questioned. Practical methods for handling such data sets are suggested. In particular, a technique to analyze data in two stages has been found most useful. A number of examples of forestry problems from the literature are described to demonstrate the range of circumstances under which these difficulties occur.