Official Statistics Data Integration for Enhanced Information Quality

This work is about integrated analysis of data collected as official statistics with administrative data from operational systems in order to increase the quality of information. Information quality, or InfoQ, is ‘the potential of a data set to achieve a specific goal by using a given empirical analysis method’. InfoQ is based on the identification of four interacting components: the analysis goal, the data, the data analysis and the utility, and it is assessed through eight dimensions: data resolution, data structure, data integration, temporal relevance, generalizability, chronology of data and goal, construct operationalization and communication. The paper illustrates, through case studies, a novel strategy to increase InfoQ based on the integration of official statistics with administrative data using copulas and Bayesian Networks. Official statistics are extraordinary sources of information. However, because of temporal relevance and chronology of data and goals, these fundamental sources of information are often not properly leveraged resulting in a poor level of InfoQ in the use of official statistics. This leads to low valued statistical analyses and to the lack of sufficiently informative results. By improving temporal relevance and chronology of data and goals, the use of Bayesian Networks allows us to calibrate official with administrative data, thus strengthening the quality of the information derived from official surveys, and, overall, enhancing InfoQ. We show, with examples, how to design and implement such a calibration strategy. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  Fabrizio Durante,et al.  Copulae in Mathematical and Quantitative Finance , 2013 .

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Ron S. Kenett,et al.  Modern Analysis of Customer Surveys: with Applications using R , 2012 .

[4]  Ron S. Kenett,et al.  Bayesian networks of customer satisfaction survey data , 2009 .

[5]  Paola Vicard,et al.  PARADATA AND BAYESIAN NETWORKS: A TOOL FOR MONITORING AND TROUBLESHOOTING THE DATA PRODUCTION PROCESS , 2006 .

[6]  Roger M. Cooke,et al.  Mining and visualising ordinal data with non-parametric continuous BBNs , 2010, Comput. Stat. Data Anal..

[7]  Gal Elidan,et al.  Copulas in Machine Learning , 2013 .

[8]  W. Harrington,et al.  Ordinal Data Mining for Fine Particles with Non Parametric Continuous Bayesian Belief Nets , 2009 .

[9]  Galit Shmueli,et al.  From Quality to Information Quality in Official Statistics , 2014 .

[10]  H. Joe Multivariate Models and Multivariate Dependence Concepts , 1997 .

[11]  Norman Fenton,et al.  Risk Assessment and Decision Analysis with Bayesian Networks , 2012 .

[12]  Erik Mønness,et al.  Industrial Statistics: Application and Development , 2006, Qual. Reliab. Eng. Int..

[13]  Roger M. Cooke,et al.  Hybrid Method for Quantifying and Analyzing Bayesian Belief Nets , 2006, Qual. Reliab. Eng. Int..

[14]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[15]  Paola Vicard,et al.  Multivariate techniques for imputation based on Bayesian networks , 2005 .

[16]  Danny Pfeffermann,et al.  New important developments in small area estimation , 2013, 1302.4907.

[17]  Ron S. Kenett On generating high InfoQ with Bayesian networks , 2016 .

[18]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[19]  Paola Vicard,et al.  Applications of Bayesian networks in official statistics , 2012 .

[20]  Roger M. Cooke,et al.  Probability Density Decomposition for Conditionally Dependent Random Variables Modeled by Vines , 2001, Annals of Mathematics and Artificial Intelligence.

[21]  Peter Elias,et al.  Administrative data as a research resource: a selected audit , 2006 .

[22]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[23]  Marco Reale,et al.  Using graphical modelling in official statistics , 2004 .

[24]  D. Kurowicka,et al.  Distribution - Free Continuous Bayesian Belief Nets , 2004 .

[25]  Galit Shmueli,et al.  On information quality , 2012, SSRN Electronic Journal.

[26]  L. D. Valle Official Statistics Data Integration Using Copulas , 2014 .

[27]  T. Bedford,et al.  Vines: A new graphical model for dependent random variables , 2002 .

[28]  A. Frigessi,et al.  Pair-copula constructions of multiple dependence , 2009 .