Social media as a data source for official statistics; the Dutch Consumer Confidence Index

In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. This model also makes it possible to utilize the higher frequency of the social media to produce more precise estimates for the sample survey in real time at the moment that statistics for the social media become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.

[1]  Moshe Feder Time Series Analysis of Repeated Surveys: The State–space Approach , 2001 .

[2]  Reinder Banning,et al.  SAMPLING THEORY , 2012 .

[3]  Danny Pfeffermann,et al.  Small-Area Estimation With State–Space Models Subject to Benchmark Constraints , 2006 .

[4]  Danny Pfeffermann,et al.  Estimation and Seasonal Adjustment of Population Means Using Data from Repeated Surveys , 1991 .

[5]  Danny Pfeffermann,et al.  Estimation of Autocorrelations of Survey Errors with Application to Trend Estimation in Small Areas , 1998 .

[6]  Alastair Scott,et al.  A Stochastic Model for Repeated Surveys , 1973 .

[7]  A. Scott,et al.  Analysis of repeated surveys using time series methods. , 1974, Journal of the American Statistical Association.

[8]  Jo Thori Lind,et al.  Repeated Surveys and the Kalman Filter , 2005 .

[9]  Danny Pfeffermann,et al.  Estimation of Mean Squared Error of X-11-ARIMA and Other Estimators of Time Series Components , 2014 .

[10]  Sabine Krieg,et al.  Dealing with small sample sizes, rotation group bias and discontinuities in a rotating panel design , 2015 .

[11]  William R. Bell,et al.  Some Consideration of Seasonal Adjustment Variances , 2005 .

[12]  F. Palm,et al.  Multivariate state space approach to variance reduction in series with level and variance breaks due to survey redesigns , 2016 .

[13]  Melvin J. Hinich,et al.  Time Series Analysis by State Space Methods , 2001 .

[14]  Piet J. H. Daas,et al.  Big Data as a Source of Statistical Information , 2014 .

[15]  J. Neyman On the Two Different Aspects of the Representative Method: the Method of Stratified Sampling and the Method of Purposive Selection , 1934 .

[16]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[17]  F. Palm,et al.  State space time series modelling of the Dutch Labour Force Survey: Model selection and mean squared errors estimation , 2017 .

[18]  M. H. Hansen,et al.  On the Theory of Sampling from Finite Populations , 1943 .

[19]  S. M. Tam,et al.  Analysis of Repeated Surveys Using a Dynamic Linear Model , 1987 .

[20]  M. Eichler Causal inference with multiple time series: principles and problems , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[21]  S. Koopman,et al.  Exact Initial Kalman Filtering and Smoothing for Nonstationary Time Series Models , 1997 .

[22]  A. Scott,et al.  The Application of Time Series Methods to the Analysis of Repeated Surveys , 1977 .

[23]  J. Brakel,et al.  Predictive inference for non probability samples , 2015 .

[24]  Dino Pedreschi,et al.  Small Area Model-Based Estimators Using Big Data Sources , 2015 .

[25]  Andrew Harvey,et al.  Estimating the underlying change in unemployment in the UK , 2000 .

[26]  Roger Tourangeau,et al.  Summary Report of the AAPOR Task Force on Non-probability Sampling , 2013 .

[27]  Siem Jan Koopman,et al.  Statistical algorithms for models in state space form: SsfPack 3.0 , 2008 .

[28]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[29]  James Durbin,et al.  Time Series Analysis by State Space Methods: Second Edition , 2012 .

[30]  P. Daas,et al.  Social media sentiment and consumer confidence , 2014 .

[31]  Gabriel Cadamuro,et al.  Predicting poverty and wealth from mobile phone metadata , 2015, Science.

[32]  J. Rao,et al.  Small‐area estimation by combining time‐series and cross‐sectional data , 1994 .

[33]  Joyce Neroni,et al.  Twitter as a potential data source for statistics , 2012 .

[34]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .