Methodological Issues and Challenges in the Production of Official Statistics 24th Annual Morris Hansen Lecture

The big advancement in technology and the availability of ‘big data’, pose new demands for more detailed, more accurate and timelier official statistics. The target of the present article is to discuss some of the major new methodological challenges underlying the production of official statistics (POS) in coming years, and in some cases suggest ways of dealing with them. In particular, I consider the following challenges: collection and management of big data for POS with integration of computer science, increasing data accessibility but maintaining privacy and confidentiality, possible use of web-panels for POS, how to deal with mode effects, measurement of error in small area estimation in conjunction with modern censuses, and integration of statistics and geospatial information. In the last part of the article I confront the question of whether universities train students to work at National Statistical Offices.

[1]  Danny Pfeffermann,et al.  Imputation and estimation under nonignorable nonresponse in household surveys with missing covariate information , 2011 .

[2]  Margo J. Anderson,et al.  Challenges to the confidentiality of U.S. Federal statistics, 1910-1965 , 2007 .

[3]  I. P. Fellegi,et al.  [Characteristics of an Effective Statistical System]: Discussion , 1996 .

[4]  Douglas Rivers,et al.  Sampling for Web Surveys , 2007, Handbook of Web Surveys.

[5]  Anders Holmberg,et al.  A Potential Framework for Integration of Architecture and Methodology to Improve Statistical Production Systems , 2013 .

[6]  H. O. Hartley,et al.  A new estimation theory for sample surveys , 1968 .

[7]  D. Tim Holt,et al.  The Official Statistics Olympic Challenge , 2007 .

[8]  Stanislav Kolenikov Training for the Modern Survey Statistician , 2015 .

[9]  Danny Pfeffermann,et al.  Statistical inference under non-ignorable sampling and non-response. An empirical likelihood approach , 2015 .

[10]  B. Weisbrod,et al.  Collective-Consumption Services of Individual-Consumption Goods , 1964 .

[11]  Markus Zwick,et al.  Big Data in der amtlichen Statistik , 2015, Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz.

[12]  Sanjay Chaudhuri,et al.  A CONDITIONAL EMPIRICAL LIKELIHOOD APPROACH TO COMBINE SAMPLING DESIGN AND POPULATION LEVEL INFORMATION , 2010 .

[13]  Matt E. Jans,et al.  Social Science Survey Methodology Training: Understanding the Past and Assessing the Present to Shape Our Future , 2015 .

[14]  Danny Pfeffermann,et al.  PARAMETRIC AND SEMI-PARAMETRIC ESTIMATION OF REGRESSION MODELS FITTED TO SURVEY DATA* , 2016 .

[15]  Martin R. Frankel,et al.  Total Survey Error. , 1980 .

[16]  Gordon Brackstone,et al.  Managing data quality in a statistical agency , 2003 .

[17]  Sharon L. Lohr,et al.  The 2009 Morris Hansen Lecture: The Care, Feeding, and Training of Survey Statisticians , 2010 .

[18]  Jeremy E. Oakley,et al.  Uncertain Judgements: Eliciting Experts' Probabilities , 2006 .

[19]  Danny Pfeffermann,et al.  PARAMETRIC DISTRIBUTIONS OF COMPLEX SURVEY DATA UNDER INFORMATIVE PROBABILITY SAMPLING , 1998 .

[20]  M. Couper A REVIEW OF ISSUES AND APPROACHES , 2000 .

[21]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[22]  Markus Zwick,et al.  Big Data in der amtlichen Statistik , 2015, Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz.

[23]  Stanislav Kolenikov,et al.  Future Training of Survey Methodologists , 2015 .

[24]  A. Myrick Freeman,et al.  The Sign and Size of Option Value , 1983 .

[25]  P. Biemer Total Survey Error: Design, Implementation, and Evaluation , 2010 .

[26]  Geert Molenberghs,et al.  Evaluating Mode Effects in Mixed-Mode Survey Data Using Covariate Adjustment Models , 2014 .

[27]  Constance F. Citro,et al.  Principles and practices for a federal statistical agency , 2005 .

[28]  V. Hurk Big Data and Official Statistics , 2013 .

[29]  Sunghee Lee Propensity score adjustment as a weighting scheme for volunteer panel web surveys , 2006 .

[30]  K. Arrow,et al.  Environmental Preservation, Uncertainty, and Irreversibility , 1974 .

[31]  Danny Pfeffermann,et al.  Fitting Generalized Linear Models under Informative Sampling , 2003 .

[32]  Miron L. Straf,et al.  Using Science as Evidence in Public Policy , 2013 .

[33]  Danny Pfeffermann,et al.  New important developments in small area estimation , 2013, 1302.4907.

[34]  Lars Vilhuber,et al.  How Protective Are Synthetic Data? , 2008, Privacy in Statistical Databases.

[35]  M. Couper Is the sky falling? new technology, changing media, and the future of surveys , 2013 .

[36]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[37]  Mick P. Couper,et al.  Designing Effective Web Surveys: Preface , 2008 .

[38]  D. Tim Holt Official statistics, public policy and public trust , 2008 .

[39]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[40]  Ivan P. Fellegi,et al.  Characteristics of an Effective Statistical System , 1996 .

[41]  P. Samuelson The Pure Theory of Public Expanditure , 1954 .

[42]  Danny Pfeffermann,et al.  Prediction of finite population totals based on the sample distribution , 2004 .

[43]  Jerome P. Reiter,et al.  Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study , 2005 .

[44]  R. Groves Three Eras of Survey Research , 2011 .

[45]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[46]  Natalie Shlomo,et al.  Comparison of Remote Analysis with Statistical Disclosure Control for Protecting the Confidentiality of Business Data , 2012, Trans. Data Priv..

[47]  T. M. F. Smith,et al.  Sample surveys 1975-1990; an age of reconciliation? , 1994 .

[48]  Ivan P. Fellegi,et al.  Official Statistics—Pressures and Challenges ISI President's Invited Lecture, 2003 , 2004 .

[49]  R. Little,et al.  Inference for the Population Total from Probability-Proportional-to-Size Samples Based on Predictions from a Penalized Spline Nonparametric Model , 2003 .

[50]  Clyde Tucker,et al.  Recruitment, training and retention of statisticians in the U.S. Federal Statistical agencies , 2010 .

[51]  R. Little,et al.  Selective Multiple Imputation of Keys for Statistical Disclosure Control in Microdata , 2003 .

[52]  Danny Pfeffermann,et al.  Multi-level modelling under informative sampling , 2006 .

[53]  R. Fay,et al.  Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data , 1979 .

[54]  Stanislav Kolenikov,et al.  On-the-Job: What to Expect and How to Succeed , 2015 .

[55]  Constance F. Citro,et al.  From multiple modes for surveys to multiple data sources for estimates , 2014 .

[56]  Nell Sedransk Statistical careers in United States government science agencies , 2010 .

[57]  Michael F. Goodchild,et al.  The morris hansen lecture 2006 statistical perspectives on Spatial Social Science , 2007 .

[58]  J. Robins,et al.  Analysis of semi-parametric regression models with non-ignorable non-response. , 1997, Statistics in medicine.

[59]  Ronit Nirel,et al.  Sample Surveys and Censuses , 2009 .

[60]  Chaitra H. Nagaraja,et al.  An Autoregressive Approach to House Price Modeling , 2011, 1104.2719.

[61]  D. Pfeffermann,et al.  Small-Area Estimation Under Informative Probability Sampling of Areas and Within the Selected Areas , 2007 .

[62]  Constance F. Citro Principles and Practices for a Federal Statistical Agency: Why, What, and to What Effect , 2014 .

[63]  A. Cavallo Online and Official Price Indexes: Measuring Argentina’s Inflation , 2012 .

[64]  Stanislav Kolenikov,et al.  Training Needs in Survey Research Methods: An Overview , 2015 .

[65]  Ingram Olkin,et al.  Leadership and Women in Statistics , 2015 .

[66]  E. Leeuw,et al.  To mix or not to mix data collection modes in surveys. , 2005 .

[67]  G. Kalton,et al.  Small-area income and poverty estimates : priorities for 2000 and beyond , 2000 .

[68]  E. Rogers Diffusion of Innovations , 1962 .

[69]  Don A. Dillman,et al.  Survey Mode as a Source of Instability in Responses across Surveys , 2005 .

[70]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[71]  Sunghee Lee,et al.  Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment , 2009 .

[72]  Danny Pfeffermann,et al.  Are Private Schools Better Than Public Schools? Appraisal for Ireland by Methods for Observational Studies. , 2011, The annals of applied statistics.

[73]  John S. Gardenier,et al.  Ethical Guidelines for Statistical Practice: The First 60 Years and Beyond , 2012 .

[74]  Natalie Shlomo,et al.  Measuring Disclosure Risk and Data Utility for Flexible Table Generators , 2015 .

[75]  Roger Tourangeau,et al.  The role of the joint program in survey methodology in training U.S. federal statisticians , 2010 .

[76]  Robert M. Groves,et al.  Total Survey Error: Past, Present, and Future , 2010 .

[77]  James O. Berger,et al.  Semiparametric Bayesian Analysis of Selection Models , 2001 .