The Analysis of Contingency Tables: From Chi-Squared Tests and Log-Linear Models to Models of Mixed Membership

The roots of the modern log-linear model approach to the analysis of cross-classified data in the form of multi-dimensional contingency tables can found in the work of S. N. Roy and his students in the 1950s at the University of North Carolina. These papers set the stage for two major sets of developments in the analysis of categorical data in the 1960s and 1970s. I describe some of these contributions, where they intersected and where they diverged in focus, and some subsequent advances, including the role of latent variables alternatives, mixed membership models, and methods for very large sparse categorical data arrays.

[1]  W. G. Cochran Some Methods for Strengthening the Common χ 2 Tests , 1954 .

[2]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[3]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[4]  V. P. Bhapkar,et al.  Marginal symmetry and quasi symmetry of general order , 1990 .

[5]  G. Koch,et al.  Estimating Covariate-Adjusted Log Hazard Ratios for Multiple Time Intervals in Clinical Trials Using Nonparametric Randomization Based ANCOVA , 2011 .

[6]  T. Speed,et al.  Markov Fields and Log-Linear Interaction Models for Contingency Tables , 1980 .

[7]  Gary G. Koch,et al.  Analyzing Panel Data with Uncontrolled Attrition , 1974 .

[8]  J S Preisser,et al.  Categorical data analysis in public health. , 1997, Annual review of public health.

[9]  Leo A. Goodman,et al.  On the estimation of parameters in latent structure analysis , 1979 .

[10]  G. Koch,et al.  On the estimation of the most probable number in a serial dilution experiment , 1978 .

[11]  S. Fienberg,et al.  DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA. , 2007, The annals of applied statistics.

[12]  Marvin A. Kastenbaum,et al.  On the Hypothesis of No "Interaction" In a Multi-way Contingency Table , 1956 .

[13]  A. Quetelet Letters Addressed to H.R.H. the Grand Duke of Saxe Coburg and Gotha; On the Theory of Probabilities, as Applied to the Moral and Political Sciences , 2012 .

[14]  S. Fienberg,et al.  Categorical Data Analysis of Single Sociometric Relations , 1981 .

[15]  S. Fienberg,et al.  The Pleasures of Statistics: The Autobiography of Frederick Mosteller , 2010 .

[16]  S. K. Lee,et al.  On the Asymptotic Variances of û Terms in Loglinear Models of Multidimensional Contingency Tables , 1977 .

[17]  H. W. Norton Calculation of Chi-Square for Complex Contingency Tables , 1945 .

[18]  V. P. Bhapkar A Note on the Equivalence of Two Test Criteria for Hypotheses in Categorical Data , 1966 .

[19]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[20]  Gary G. Koch,et al.  Average Partial Association in Three-way Contingency Tables: a Review and Discussion of Alternative Tests , 1978 .

[21]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[22]  V. P. Bhapkar Some Tests for Categorical Data , 1961 .

[23]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[24]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M. W. Birch Maximum Likelihood in Three-Way Contingency Tables , 1963 .

[26]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .

[27]  M. Woodbury,et al.  Mathematical typology: a grade of membership technique for obtaining disease definition. , 1978, Computers and biomedical research, an international journal.

[28]  Tapabrata Maiti,et al.  Analysis of Longitudinal Data (2nd ed.) (Book) , 2004 .

[29]  G G Koch,et al.  Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and non-parametric strategies for addressing them. , 1998, Statistics in medicine.

[30]  Gary G. Koch,et al.  An Incomplete Contingency Table Approach to Paired-Comparison Experiments , 1976 .

[31]  G G Koch,et al.  A general methodology for the analysis of experiments with repeated measurement of categorical data. , 1977, Biometrics.

[32]  S. E. Fienberg,et al.  Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data , 2007, 0709.3535.

[33]  Jean-Michel Coron Annales de la Faculté des Sciences de Toulouse , 1892 .

[34]  Stephen E Fienberg,et al.  Reconceptualizing the classification of PNAS articles , 2010, Proceedings of the National Academy of Sciences.

[35]  Howard Wainer,et al.  Editorial: A Catch-22 in Assigning Primary Delegates , 2007 .

[36]  G G Koch,et al.  An application of multivariate ratio methods for the analysis of a longitudinal clinical trial with missing data. , 1978, Biometrics.

[37]  W. D. Johnson,et al.  A Note on the Weighted Least Squares Analysis of the Ries-Smith Contingency Table Data , 1971 .

[38]  Stephen E. Fienberg,et al.  Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation , 2007 .

[39]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[40]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[41]  G G Koch,et al.  Linear model analysis of categorical data with incomplete response vectors. , 1972, Biometrics.

[42]  G. Koch,et al.  The Application of the Principle of Intention–to–Treat to the Analysis of Clinical Trials , 1991 .

[43]  Gary G. Koch,et al.  Multiple‐Record Systems , 2006 .

[44]  S. Fienberg,et al.  Log linear representation for paired and multiple comparisons models , 1976 .

[45]  Ronald E. LaPorte,et al.  Capture-recapture and multiple-record systems estimation II: Applications in human diseases. International Working Group for Disease Monitoring and Forecasting. , 1995, American journal of epidemiology.

[46]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[47]  L. A. Goodman On Quasi-Independence in Triangular Contingency Tables , 1979 .

[48]  G. Koch,et al.  Categorical Data Analysis: Some Reflections on the Log Linear Model and Logistic Regression. Part I: Historical and Methodological Overview* , 1981 .

[49]  H. Dennis Tolley,et al.  The asymptotic covariance structure of estimated parameters from contingency table log-linear models , 1975 .

[50]  S. S. Wilks The Likelihood Test of Independence in Contingency Tables , 1935 .

[51]  S. Fienberg,et al.  Two-Dimensional Contingency Tables with Both Completely and Partially Cross-Classified Data , 1974 .

[52]  Shelby J. Haberman,et al.  Log-Linear Models for Frequency Data: Sufficient Statistics and Likelihood Equations , 1973 .

[53]  James E. Grizzle,et al.  Log Linear Models and Tests of Independence for Contingency Tables , 1972 .

[54]  J. E. Jackson The Analysis of Cross-Classified Data Having Ordered Categories , 1986 .

[55]  Gary G. Koch,et al.  Categorical Data Analysis Using The SAS1 System , 1995 .

[56]  M. M. Meyer,et al.  Loglinear models and categorical data analysis with psychometric and econometric applications , 1983 .

[57]  S E Fienberg,et al.  A three-sample multiple-recapture approach to census population estimation with heterogeneous catchability. , 1993, Journal of the American Statistical Association.

[58]  L. A. Goodman On Partitioning χ2 and Detecting Partial Association in Three‐Way Contingency Tables , 1969 .

[59]  J. Darroch Interactions in Multi‐Factor Contingency Tables , 1962 .

[60]  G G Koch,et al.  Applying sample survey methods to clinical trials data , 2001, Statistics in medicine.

[61]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[62]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[63]  S. Haberman Log-Linear Models for Frequency Tables Derived by Indirect Observation: Maximum Likelihood Equations , 1974 .

[64]  Leo A. Goodman,et al.  On Plackett's Test for Contingency Table Interactions , 1963 .

[65]  S. Fienberg The multiple recapture census for closed populations and incomplete 2k contingency tables , 1972 .

[66]  Gary G. Koch,et al.  On the Hypotheses of 'No Interaction' in Contingency Tables , 1968 .

[67]  S. Fienberg,et al.  Population Size Estimation Using Individual Level Mixture Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[68]  George E. P. Box,et al.  Some Aspects of Multivariate Analysis , 2011 .

[69]  Daniel Manrique-Vallier,et al.  Longitudinal mixed membership models with applications to disability survey data , 2010 .

[70]  L. A. Goodman Partitioning of Chi-Square, Analysis of Marginal Contingency Tables, and Estimation of Expected Frequencies in Multidimensional Contingency Tables , 1971 .

[71]  K. Pearson On the χ 2 Test of Goodness of Fit , 1922 .

[72]  I. Good Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables , 1963 .

[73]  S. Haberman,et al.  Correction to as 51: Log‐Linear Fit for Contingency Tables , 1976 .

[74]  Shelby J. Haberman,et al.  A Stabilized Newton-Raphson Algorithm for Log-Linear Models for Frequency Tables Derived by Indirect Observation , 1988 .

[75]  S. Fienberg,et al.  Incomplete two-dimensional contingency tables. , 1969, Biometrics.

[76]  Shelby J. Haberman,et al.  Log-Linear Models and Frequency Tables with Small Expected Cell Counts , 1977 .

[77]  S. Fienberg,et al.  Classical multilevel and Bayesian approaches to population size estimation using multiple lists , 1999 .

[78]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[79]  G. Koch,et al.  Estimating the Total Number of Events with Data from Multiple-Record Systems: A Review of Methodological Strategies , 1977 .

[80]  Vasant P. Bhapkar,et al.  On tests of marginal symmetry and quasi-symmetry in two and three-dimensional contingency tables , 1979 .

[81]  Ronald E. LaPorte,et al.  Capture-recapture and multiple-record systems estimation I: History and theoretical development ( Review ) , 1995 .

[82]  G. Koch,et al.  The asymptotic covariance structure of log-linear model estimated parameters for the multiple recapture census , 1976 .

[83]  A. Agresti An introduction to categorical data analysis , 1997 .

[84]  G. Koch,et al.  Analysis of categorical data by linear models. , 1969, Biometrics.

[85]  S. Haberman,et al.  Log‐Linear Fit for Contingency Tables , 1972 .

[86]  M. Bartlett Contingency Table Interactions , 1935 .

[87]  R. Plackett A Note on Interactions in Contingency Tables , 1962 .

[88]  Leo A. Goodman,et al.  On Methods for Comparing Contingency Tables , 1963 .

[89]  G. Koch,et al.  Estimating Covariate-Adjusted Incidence Density Ratios for Multiple Time Intervals in Clinical Trials Using Nonparametric Randomization-Based ANCOVA , 2011 .

[90]  A generalized modified-chi2 analysis of categorical bacteria survival data from a complex dilution experiment. , 1975, Biometrics.

[91]  Jay Magidson,et al.  Analyzing qualitative/categorical data: Log-linear models and latent-structure analysis , 1978 .

[92]  Gary G. Koch,et al.  Categorical data analysis using the SAS , 2012 .

[93]  M. M. Meyer,et al.  Statistical Analysis of Multiple Sociometric Relations. , 1985 .

[94]  C. Craig,et al.  Power Functions of the Gamma Distribution , 1958 .

[95]  G. Koch,et al.  Stratified Multivariate Mann–Whitney Estimators for the Comparison of Two Treatments with Randomization Based Covariance Adjustment , 2011 .

[96]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[97]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[98]  Stephen M. Stigler,et al.  The missing early history of contingency tables , 2002 .

[99]  Nicholas Eriksson,et al.  Polyhedral conditions for the nonexistence of the MLE for hierarchical log-linear models , 2006, J. Symb. Comput..

[100]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[101]  V. P. Bhapkar,et al.  Some nonparametric analogues of `Normal' ANOVA, MANOVA, and of studies in 'normal' association , 1959 .

[102]  S. Mitra,et al.  AN INTRODUCTION TO SOME NON-PARAMETRIC GENERALIZATIONS OF ANALYSIS OF VARIANCE AND MULTIVARIATE ANALYSIS , 1956 .