Diagnostics in a simple correspondence analysis model: An approach based on Cook's distance for log-linear models

Diagnostics have not received much attention in the literature of simple correspondence analysis models. Since Cook's distance was defined to identify influential observations of the linear regression model, it has been extended to different models, in particular to log-linear models. In this paper we provide the asymptotic distribution of Cook's distance of any kind of log-linear models and also a method for diagnostics, based on it. By using Goodman's R C ( K ) model as a log-linear model to approximate the ordinary simple correspondence analysis procedure, we follow a Cook's distance approach to identify influential cells and three examples illustrate the performance of this method.

[1]  Erling B. Andersen Diagnostics in categorical data analysis , 1992 .

[2]  Sonja Kuhnt,et al.  Outlier detection in contingency tables based on minimal patterns , 2014, Stat. Comput..

[3]  Nirian Martín,et al.  On the asymptotic distribution of Cook's distance in logistic regression models , 2009 .

[4]  N. Martín Using Cook's distance in polytomous logistic regression. , 2015, The British journal of mathematical and statistical psychology.

[5]  T. Nguyen,et al.  On Christensen's conjecture , 2007 .

[6]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[7]  K. Worsley,et al.  Comment on “Correspondence analysis used complementary to loglinear analysis” , 1988 .

[8]  A. Romney,et al.  Metric Scaling: Correspondence Analysis , 1990 .

[9]  Arjun K. Gupta,et al.  Residual analysis and outliers in loglinear models based on phi-divergence statistics , 2007 .

[10]  Zvi Gilula,et al.  Grouping and Association in Contingency Tables: An Exploratory Canonical Correlation Approach , 1986 .

[11]  W. Härdle,et al.  Applied Multivariate Statistical Analysis , 2003 .

[12]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[13]  Leandro Pardo Llorente,et al.  New families of estimators and test statistics in log-linear models , 2008 .

[14]  Eric J. Beh,et al.  Simple Correspondence Analysis: A Bibliographic Review , 2004 .

[15]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[16]  Diana Sommer,et al.  Log Linear Models And Logistic Regression , 2016 .

[17]  E. Beh Simple correspondence analysis using adjusted residuals , 2012 .

[18]  Shelby J. Haberman,et al.  Log-Linear Models for Frequency Tables with Ordered Classifications , 1974 .

[19]  George R. Franke,et al.  Correspondence Analysis: Graphical Representation of Categorical Data in Marketing Research , 1986 .

[20]  Gilbert Saporta,et al.  L'analyse des données , 1981 .

[21]  S. Haberman,et al.  Canonical Analysis of Contingency Tables by Maximum Likelihood , 1986 .

[22]  P.G.M. Van der Heijden,et al.  A Combined Approach to Contingency Table Analysis Using Correspondence Analysis and Log-Linear Analysis , 1989 .

[23]  Leo A. Goodman,et al.  Association Models and Canonical Correlation in the Analysis of Cross-Classifications Having Ordered Categories , 1981 .

[24]  Leo A. Goodman,et al.  Some Useful Extensions of the Usual Correspondence Analysis Approach and the Usual Log-Linear Models Approach in the Analysis of Contingency Tables , 1986 .

[25]  Sonja Kuhnt,et al.  Correspondence Analysis in the Case of Outliers , 2013, Classification and Data Mining.

[26]  L. A. Goodman The Analysis of Cross-Classified Data Having Ordered and/or Unordered Categories: Association Models, Correlation Models, and Asymmetry Models for Contingency Tables With or Without Missing Entries , 1985 .