Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data

We develop a robust estimator—the hyperbolic tangent (tanh) estimator—for overdispersed multinomial regression models of count data. The tanh estimator provides accurate estimates and reliable inferences even when the specified model is not good for as much as half of the data. Seriously ill-fitted counts—outliers—are identified as part of the estimation. A Monte Carlo sampling experiment shows that the tanh estimator produces good results at practical sample sizes even when ten percent of the data are generated by a significantly different process. The experiment shows that, with contaminated data, estimation fails using four other estimators: the nonrobust maximum likelihood estimator, the additive logistic model and two SUR models. Using the tanh estimator to analyze data from Florida for the 2000 presidential election matches well-known features of the election that the other four estimators fail to capture. In an analysis of data from the 1993 Polish parliamentary election, the tanh estimator gives sharper inferences than does a previously proposed heteroskedastic SUR model.

[1]  J. Davis Univariate Discrete Distributions , 2006 .

[2]  P. J. Huber Robust Statistics: Huber/Robust Statistics , 2005 .

[3]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[4]  R. Alvarez,et al.  The Foundations of Latino Voter Partisanship: Evidence from the 2000 Election , 2003, The Journal of Politics.

[5]  J. Klich,et al.  Democratic Institutions and Economic Reform: The Polish Case , 2002, British Journal of Political Science.

[6]  E. Mcdonagh Political Citizenship and Democratization: The Gender Paradox , 2002, American Political Science Review.

[7]  Lael R. Keiser,et al.  Lipstick and Logarithms: Gender, Institutional Context, and Representative Bureaucracy , 2002, American Political Science Review.

[8]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[9]  Patrick J. Kenney,et al.  The Slant of the News: How Editorial Endorsements Influence Campaign Coverage and Citizens' Views of Candidates , 2002, American Political Science Review.

[10]  Leonard Ray,et al.  Descriptive Representation, Policy Outcomes, and Municipal Day-Care Coverage in Norway , 2002 .

[11]  D. Brady,et al.  Out of Step, Out of Office: Electoral Accountability and House Members' Voting , 2002, American Political Science Review.

[12]  Jason Wittenberg,et al.  An Easy and Accurate Regression Model for Multiparty Electoral Data , 2002, Political Analysis.

[13]  John E. Jackson A Seemingly Unrelated Regression Model for Analyzing Multiparty Elections , 2002, Political Analysis.

[14]  Madeleine Walker,et al.  Masking unmasked , 2002, The Journal of audiovisual media in medicine.

[15]  B. Monroe,et al.  Electoral Systems and Unimagined Consequences: Partisan Effects of Districted Proportional Representation , 2002 .

[16]  R. Lau,et al.  Effectiveness of negative campaigning in U.S. Senate elections , 2002 .

[17]  J. T. Wulu,et al.  Regression analysis of count data , 2002 .

[18]  Jonathan N. Wand,et al.  The Butterfly Did It: The Aberrant Vote for Buchanan in Palm Beach County, Florida , 2001, American Political Science Review.

[19]  Jeffrey Toobin Too close to call : the thirty-six-day battle to decide the 2000 election , 2001 .

[20]  Jeffrey Toobin Too close to call , 2001 .

[21]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[22]  Gary King,et al.  A Statistical Model for Multiparty Electoral Data , 1999, American Political Science Review.

[23]  J. Sekhon,et al.  Genetic Optimization Using Derivatives , 1998, Political Analysis.

[24]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[25]  F. Famoye,et al.  Modeling household fertility decisions with generalized Poisson regression , 1997, Journal of population economics.

[26]  E. Ronchetti,et al.  Robust Estimation for Grouped Data , 1997 .

[27]  L. DeSipio Counting on the Latino Vote: Latinos as a New Electorate , 1996 .

[28]  R. Little Foreign policy analysis: continuity and change in its second generation , 1995 .

[29]  Bruce Western,et al.  Concepts and Suggestions for Robust Regression Analysis , 1995 .

[30]  Laura Neack,et al.  Foreign Policy Analysis: Continuity and Change in Its Second Generation , 1995 .

[31]  P. Rousseeuw,et al.  Generalized S-Estimators , 1994 .

[32]  Andreas Christmann,et al.  Least median of weighted squares in logistic regression with large strata , 1994 .

[33]  Halbert White,et al.  Estimation, inference, and specification analysis , 1996 .

[34]  William J. Dixon,et al.  Inequality and Political Violence Revisited , 1993, American Political Science Review.

[35]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[36]  R. Carroll,et al.  On Robustness in the Logistic Regression Model , 1993 .

[37]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .

[38]  Kunio Tanabe,et al.  An exact Cholesky decomposition and the generalized inverse of the variance-covariance matrix of the multinomial distribution, with applications , 1992 .

[39]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[40]  R. Carroll,et al.  Conditionally Unbiased Bounded-Influence Estimation in General Regression Models, with Applications to Generalized Linear Models , 1989 .

[41]  D. Ruppert,et al.  Transformation and Weighting in Regression , 1988 .

[42]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[43]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[44]  D. Ruppert,et al.  Optimally bounded score functions for generalized linear models with applications to logistic regression , 1986 .

[45]  Z. Griliches,et al.  Econometric Models for Count Data with an Application to the Patents-R&D Relationship , 1984 .

[46]  Philip E. Gill,et al.  Practical optimization , 1981 .

[47]  P. Rousseeuw,et al.  The Change-of-Variance Curve and Optimal Redescending M-Estimators , 1981 .

[48]  R. Douglas Martin,et al.  ROBUST METHODS FOR TIME SERIES , 1981 .

[49]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .