Collapsibility and response variables in contingency tables

SUMMARY Various definitions of the collapsibility of a hierarchical log linear model for a multidimensional contingency table are considered and shown to be equivalent. Necessary and sufficient conditions for collapsibility are found in terms of the generating class. It is shown that log linear models are appropriate for tables with response and explanatory variables if and only if they are collapsible onto the explanatory variables. Some key word8: Collapsibility; Contingency table; Graphical model; Interaction graph; Log linear model; Response variable; S-sufficiency. shown to be closely related. Some models have the property that relations between a set of the classifying factors may be studied by examination of the table of marginal totals formed by summing over the remaining factors. Such models are said to be collapsible onto the given set of factors. Collapsibility has important consequences for hypothesis testing and model selection, and can be useful in data reduction. We consider various definitions of collapsibility and show their equivalence. Furthermore, necessary and sufficient conditions for collapsi- bility are found in terms of the generating class. Many tables analysed in practice involve response variables. Simple examples, one of which is given in ? 3, suffice to show the importance of distinguishing between response and explanatory variables: first, that inappropriate models may be avoided, and second that natural and relevant models that are not log linear may be considered. This paper characterizes appropriate and inappropriate log linear models for tables with response variables and some alternative approaches for the analysis of such tables are briefly considered. We consider a multidimensional contingency table N based on a set of classifying factors F. For a given subset a of F we are interested in the table of marginal totals Na, that is to say the table of cell counts summed over the remaining factors aC, that is the complement of a in F. We identify a hierarchical log linear model L, that is the set of probabilities p E L, with its generating class, whose elements, generators, are given in square brackets: thus for example the model (AB) (BCD) for a 4-way table corresponds