Comment: Struggles with Survey Weighting and Regression Modeling

Andrew German's article "Struggles with survey weighting and regression modeling" addresses the question of what approach analysts should use to pro duce estimates (and associated estimates of variabil ity) based on sample survey data. Gelman starts by asserting that survey weighting is a "mess." While we agree that incorporation of the survey design for regres sion remains challenging, with important open ques tions, many recent contributions to the literature have greatly clarified the situation. Examples include rela tively recent contributions by Pfeffermann and Sverch kov (1999), Graubard and Korn (2002) and Little (2004). Gelman's paper is a very welcome addition to that literature. There are some understandable reasons for the cur rent lack of resolution. First, U.S. federal statistical agencies have been historically limited by their mis sion statements to producing statistical summaries, pri marily means, percentages, ratios and cross-classified tables of counts. This is one explanation for why Cochran (1977) and Kish (1965) devote the great ma jority of their classical texts to these estimates. As a result, the job of using regression and other more com plex models to learn about any causal structure under lying these summary statistics was generally left to sis ter policy agencies and outside users. However, things are changing. The federal statisti cal system (whether it likes it or not) is becoming more involved with complex modeling. This includes small-area estimation (e.g., unemployment estimates and census net undercoverage estimates) and research into models combining information from surveys with administrative data. (There will also likely be increased demands to use data mining procedures on federal statistical data.) This relatively new development has