Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnThe Map

We report results from the rst comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total employment, beginning-of-quarter employment, full-quarter employment, total payroll, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted by generating multiple threads of the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model, with each thread or implicate being the output of formal probability models that address coverage, edit, and imputation errors. Design-based sampling variability and nite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have quality in the range generally deemed acceptable. Tabulations involving zero, one or two jobs, which are generally suppressed in the QWI and synthesized in LODES, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI based on a similar number of workers.

[1]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[2]  P. Biemer Total Survey Error: Design, Implementation, and Evaluation , 2010 .

[3]  Laura Voshell Zayatz,et al.  Using noise for disclosure limi-tation of establishment tabular data , 1998 .

[4]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[5]  Holmberg Anders,et al.  Extending TSE to Administrative Data: A Quality Framework and Case Studies from Stats NZ , 2017 .

[6]  Polly Phipps,et al.  Development of a Quality Framework and Quality Indicators at the Bureau of Labor Statistics , 2014 .

[7]  Jeffrey S. Racine,et al.  Nonparametric estimation of distributions with categorical and continuous data , 2003 .

[8]  Ashwin Machanavajjhala,et al.  Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics , 2017, SIGMOD Conference.

[9]  Shawn A. Ross,et al.  Survey Methodology , 2005, The SAGE Encyclopedia of the Sociology of Religion.

[10]  Kevin L. McKinney,et al.  Dynamically Consistent Noise Infusion and Partially Synthetic Data as Confidentiality Protection Measures for Related Time Series , 2012 .

[11]  Rob J. Hyndman,et al.  A Bayesian approach to bandwidth selection for multivariate kernel density estimation , 2006, Comput. Stat. Data Anal..

[12]  Robert M. Groves,et al.  Total Survey Error: Past, Present, and Future , 2010 .

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[14]  Kevin L. McKinney,et al.  Using Worker Flows to Measure Firm Dynamics , 2007 .

[15]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[16]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[17]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[18]  Lars Vilhuber,et al.  The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators , 2009 .

[19]  Lars Vilhuber,et al.  The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers , 2003 .