A two-step procedure is proposed for the analysis of factorial experiments with unequal replication. The procedure entails a check for interaction in the general means model, followed by estimation of either main effects or simple effects. The use of a set of treatment mean comparisons which address the hypotheses of interest is advocated over a set which is orthogonal and dependent on the number of replications. Emphasis is on estimation of means, appropriate mean comparisons, and standard errors rather than upon hypothesis testing. The problem of no replication for some treatments is briefly discussed along with the inherent difficulties. No replication is considered for the distinct cases where interaction is of concern or interest and where it is defined to be zero in the case of many cross-classified factors in observational studies. The proposed approach to data analysis is applied to the results of a multiple cropping experiment. Care is exercised when invoking a statistical computing package so that the pitfalls of a default analysis are avoided. The aim of data analysis is to allow the experimenter to specify mean comparisons of research interest rather than rely upon the default options of computing packages. 1 Assistant Professor, Biometrics Unit, Cornell University, Ithaca, NY 2Adjunct Professor, Dept. of Plant Breeding and Biometry, Cornell University, Ithaca, NY 3In the Biometrics Unit technical report series, Cornell University, Ithaca, NY, 14853. Pragmatic Methodology ... BU-969-M -2INTRODUCTION Meredith & Cady March 1988 The analysis of treatment means from factorial experiments with unequal replication is a problem that often confronts researchers from many areas. Unequal replication may arise due to economic constraints at the onset of an experiment, or due to the loss of experimental units while the experiment is being conducted. Unequal replication has sometimes been termed "unbalanced" or "messy" data in the statistical literature. There is abundant statistical literature addressing the problem of analyzing data from experiments with unequal replication (Searle, 1971, 1987a,b; Hocking, 1984; Freund, Littell, and Spector, 1987; Speed, Hocking and Hackney, 1978). For over a decade focus has been upon the appropriate sums-of-squares to use when analyzing unbalanced cross-classified data as in factorial experiments. The question never seemed to arise whether or not the F-tests associated with these sums-ofsquares were addressing hypotheses of interest to the investigator. A consequence of this focus is that many practical investigators are now well versed in "the analysis of unbalanced data" and therefore fail to provide analyses of their experiments that are as useful or powerful as they could be. The questions inherent in most factorial experiments are best addressed via estimates of meaningful contrasts amongst the observed treatment means and their associated standard errors. Given the focus of the past decade, and its wealth of literature, the task of finding the appropriate procedures for the problem at hand has become an unnecessarily difficult one. To assist in this task some subject matter journals have tried to specify guidelines for their prospective authors to follow in presenting the results of data analysis. For example, "Instructions to Authors" in the Agronomy Journal (1982) give some indication of how agricultural researchers should report results of experiments with well-defined treatment structures. The following is quoted from the Statistical Methods section of the "Instructions to Authors": "Whenever possible, treatment comparisons that are logical from a scientific standpoint should be made as single df contrasts as part of the analysis of variance. Orthogonality of these contrasts is desirable because information from one test is independent of others but such orthogonality is not necessary. A more important criterion is whether the particular contrasts are meaningful and/ or were planned before the data were examined." It would seem that with recommendations such as the above appearing in subject matter journals that the instructor of statistical methods to an audience that Pragmatic Methodology ... BU-969-M -3Meredith & Cady March 1988 conducts (or shall conduct) designed research should provide their students with a methodology that is tenable in the real world. The past focus on F-tests associated with various sums-of-squares has led such methodology astray. With the above discussion in mind the present article proposes a systematic approach to the analysis of factorial experiments having unequal replication. Emphasis is placed on estimating meaningful treatment mean comparisons and their standard errors. The recommended procedure uses available statistical computing packages with general linear model or regression programs including options that easily handle (i) continuous and categorical factors, and (ii) estimation of treatment mean comparisons and their standard errors. As an example, the proposed approach is applied to data from a multiple cropping experiment. DISCUSSION AND METHODOLOGY Typically, a researcher is interested in estimating sample means and their associated standard errors. If the treatments are in a factorial arrangement then well-defined single degree-of-freedom (df) contrasts may be estimated from the sample means. The standard errors associated with each contrast need to be calculated as well. It should be noted that these contrasts and standard errors are not supplied in the default output of any statistical package since the contrasts are dictated by the objectives of the researcher in designing the experiment. Consider a 2 x 3 factorial experiment where each of the three levels of factor A occur with each of the two levels of factor B. The statistical layout and expected cell means appear as:
[1]
Foster B. Cady,et al.
Analyzing Experimental Data by Regression
,
1985
.
[2]
Lysbeth A. Woolcott,et al.
Analysis and Interpretation
,
1983
.
[3]
William G. Cochran,et al.
Experimental designs, 2nd ed.
,
1957
.
[4]
N. S. Urquhart,et al.
Linear Models in Messy Data: Some Problems and Alternatives
,
1978
.
[5]
S. R. Searle,et al.
Linear Models For Unbalanced Data
,
1988
.
[6]
T. A. Bancroft,et al.
ANALYSIS AND INFERENCE FOR INCOMPLETELY SPECIFIED MODELS INVOLVING THE USE OF PRELIMINARY TEST(S) OF SIGNIFICANCE
,
1964
.
[7]
R. R. Hocking,et al.
Methods of Analysis of Linear Models with Unbalanced Data
,
1978
.
[8]
S. R. Searle.
Linear Models
,
1971
.
[9]
Shayle R. Searle.
Linear models for some-cells-empty data: the cell means formulation, a consultant's best friend
,
1987
.
[10]
F. M. Speed,et al.
Exact F Tests for the Method of Unweighted Means in a 2 k Experiment
,
1979
.
[11]
A. J. Barr,et al.
SAS user's guide
,
1979
.
[12]
S. R. Searle,et al.
Population Marginal Means in the Linear Model: An Alternative to Least Squares Means
,
1980
.
[13]
R. R. Hocking.
The analysis of linear models
,
1985
.