A new approach for disclosure control in the IAB establishment panel—multiple imputation for a better data access

For micro-datasets considered for release as scientific or public use files, statistical agencies have to face the dilemma of guaranteeing the confidentiality of survey respondents on the one hand and offering sufficiently detailed data on the other hand. For that reason, a variety of methods to guarantee disclosure control is discussed in the literature. In this paper, we present an application of Rubin’s (J. Off. Stat. 9, 462–468, 1993) idea to generate synthetic datasets from existing confidential survey data for public release.We use a set of variables from the 1997 wave of the German IAB Establishment Panel and evaluate the quality of the approach by comparing results from an analysis by Zwick (Ger. Econ. Rev. 6(2), 155–184, 2005) with the original data with the results we achieve for the same analysis run on the dataset after the imputation procedure. The comparison shows that valid inferences can be obtained using the synthetic datasets in this context, while confidentiality is guaranteed for the survey participants.

[1]  Ruth Brand Anonymität von Betriebsdaten : Verfahren zur Erfassung und Maßnahmen zur Verringerung des Reidentifikationsrisikos , 2000 .

[2]  Simon D. Woodcock,et al.  Disclosure Limitation in Longitudinal Linked Data , 2002 .

[3]  Gerd Ronning,et al.  Estimation of the Probit Model from Anonymized Micro Data , 2006 .

[4]  Donald B. Rubin,et al.  Multiple imputations in sample surveys , 1978 .

[5]  John M. Abowd,et al.  Multiply-Imputing Confidential Characteristics and File Links in Longitudinal Linked Data , 2004, Privacy in Statistical Databases.

[6]  Michael Moritz,et al.  The German-Czech Border Region after the Fall of the Iron Curtain: Effects on the Labour Market , 2007 .

[7]  Michael Lechner,et al.  Are Training Programs More Effective When Unemployment Is High? , 2006, Journal of Labor Economics.

[8]  Jerome P. Reiter,et al.  Satisfying Disclosure Restrictions With Synthetic Data Sets , 2002 .

[9]  Gesine Stephan,et al.  Wage distributions by wage-setting regime , 2005 .

[10]  Jerome P. Reiter,et al.  Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality , 2007 .

[11]  Corinna Kleinert,et al.  Does Unemployment Help or Hinder Becoming Independent? The Role of Employment Status for Leaving the Parental Home , 2007 .

[12]  Claus Schnabel,et al.  How fast do newly founded firms mature? : empirical analyses on job quality in start-ups , 2005 .

[13]  Donald B. Rubin,et al.  The Design of a General and Flexible System for Handling Nonresponse in Sample Surveys , 2004 .

[14]  Susanne Rässler,et al.  Analyzing the changing gender wage gap based on multiply imputed right censored wages , 2005 .

[15]  Christian Gaggermeier,et al.  Pension and children: Pareto improvement with heterogeneous preferences , 2006 .

[16]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[17]  Thomas Zwick Continuing Vocational Training Forms and Establishment Productivity in Germany , 2005 .

[18]  Cornelia Kristen,et al.  The educational attainment of the second generation in Germany , 2007 .

[19]  Holger Seibert Frühe Flexibilisierung? Regionale Mobilität nach der Lehrausbildung in Deutschland zwischen 1977 und 2004 , 2007 .

[20]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases , 2002, Lecture Notes in Computer Science.

[21]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[22]  Helmut Rudolph Indikator gesteuerte Verteilung von Eingliederungsmitteln im SGB II: Erfolgs- und Effizienzkriterien als Leistungsanreiz? , 2006 .

[23]  David I. Levine,et al.  The acceptability of layoffs and pay cuts: comparing North America with Germany , 2005 .

[24]  Jörg Höhne Anonymisierungsverfahren für Paneldaten , 2008, AStA Wirtschafts und Sozialstatistisches Arch..

[25]  Reinhard Hujer,et al.  The effects of job creation schemes on the unemployment duration in East Germany , 2006 .

[26]  R. Little,et al.  Selective Multiple Imputation of Keys for Statistical Disclosure Control in Microdata , 2003 .

[27]  Lutz Bellmann,et al.  Das IAB-Betriebspanel: Konzeption und Anwendungsbereiche , 2002 .

[28]  W. Eichhorst,et al.  Activation Policies in Germany From Status Protection to Basic Income Support , 2007 .

[29]  Roman Lutz,et al.  Was spricht eigentlich gegen eine private Arbeitslosenversicherung? , 2007 .

[30]  Michael Lechner,et al.  Waiting for the Economy to Take Offactive Labour Market Policy in East Germany , 2009 .

[31]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[32]  Annekatrin Niebuhr,et al.  Migration and innovation: Does cultural diversity matter for regional R&D activity? , 2010 .

[33]  Susanne Rässler,et al.  Wirkungsanalyse in der Bundesagentur für Arbeit : Konzeption, Datenbasis und ausgewählte Befunde , 2006 .

[34]  Gerd Ronning,et al.  Post-Randomization Under Test: Estimation of the Probit Model , 2005 .

[35]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[36]  Sandra Gottschalk Unternehmensdaten zwischen Datenschutz und Analysepotenzial , 2005 .

[37]  Uwe Blien,et al.  Model-based classification of regional labour markets: for purposes of labour market policy , 2006 .

[38]  Alexandra Schmucker,et al.  The IAB establishment panel: from sample to survey to projection , 2008 .

[39]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[40]  Stefan Bender,et al.  Die IAB-Beschäftigtenstichprobe , 2002 .

[41]  Jerome P. Reiter,et al.  Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study , 2005 .

[42]  Jerome P. Reiter,et al.  Multiple Imputation for Statistical Disclosure Limitation , 2003 .

[43]  Stefan Bender,et al.  The linked employer-employee dataset of the IAB (LIAB) , 2005 .

[44]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[45]  Stefan Bender,et al.  The wage effects of entering motherhood , 2007 .

[46]  Joachim Wagner,et al.  Neue Möglichkeiten zur Nutzung vertraulicher amtlicher Personen- und Firmendaten (Research Data Centers and new ways to access confidential official micro data) , 2007 .

[47]  Bernd Fitzenberger,et al.  Get Training or Wait? Long-Run Employment Effects of Training Programs for the Unemployed in West Germany , 2006, SSRN Electronic Journal.

[48]  Susanne Rässler,et al.  Das TrEffeR-Projekt der Bundesagentur für Arbeit: die Wirkung von Maßnahmen aktiver Arbeitsmarktpolitik , 2007 .

[49]  Anna Oganian,et al.  A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality , 2006 .

[50]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[51]  Antje Mertens,et al.  ARE FIXED-TERM JOBS BAD FOR YOUR HEALTH?: A COMPARISON OF WEST-GERMANY AND SPAIN , 2007 .

[52]  Christian Hohendanner,et al.  Verdrängen Ein-Euro-Jobs sozialversicherungspflichtige Beschäftigung in den Betrieben? , 2007 .

[53]  Ramona Pohl,et al.  Neue Datenangebote in den Forschungsdatenzentren – Betriebs- und Unternehmensdaten im Längsschnitt – , 2008, AStA Wirtschafts und Sozialstatistisches Arch..

[54]  A. Kennickell Multiple Imputation and Disclosure Protection : TheCase of the 1995 Survey of Consumer Finances , 2000 .

[55]  Stefan Bender,et al.  The Wage Effects of Entering Motherhood - a Within-Firm Matching Approach , 2006 .

[56]  Stephan L. Thomsen,et al.  Identifying effect heterogeneity to improve the efficiency of job creation schemes in Germany , 2008 .

[57]  Johannes Ludsteck,et al.  Employment effects of centralization in wage setting in a median voter model , 2006 .

[58]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[59]  Stefan Bender,et al.  Iab Employment Subsample 1975-1995: Opportunities for Analysis Provided by the Anonymised Subsample , 2000, SSRN Electronic Journal.

[60]  Stephan L. Thomsen,et al.  Individual employment effects of job creation schemes in Germany with respect to sectoral heterogeneity , 2005 .

[61]  John M. Abowd,et al.  New Approaches to Confidentiality Protection: Synthetic Data, Remote Access and Research Data Centers , 2004, Privacy in Statistical Databases.

[62]  Thomas Rothe,et al.  Labour market dynamics from a regional perspective The multi-account system , 2005 .

[63]  D. Rubin,et al.  Small-sample degrees of freedom with multiple imputation , 1999 .

[64]  Johann Fuchs,et al.  Effekte alternativer Annahmen auf die prognostizierte Erwerbsbevölkerung , 2006 .

[65]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[66]  Gesine Stephan,et al.  How collective contracts and works councils reduce the gender wage gap , 2004 .

[67]  Julia Lane,et al.  Optimizing the Use of Micro-Data: An Overview of the Issues , 2005 .