Big Data as a Source for Official Statistics

Abstract More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. This article discusses the exploration of both opportunities and challenges for official statistics associated with the application of Big Data. Experiences gained with analyses of large amounts of Dutch traffic loop detection records and Dutch social media messages are described to illustrate the topics characteristic of the statistical analysis and use of Big Data.

[1]  Piet Daas,et al.  Selectivity of Big data , 2014 .

[2]  Rachel Schutt,et al.  Doing Data Science , 2013 .

[3]  P. Daas,et al.  Social media sentiment and consumer confidence , 2014 .

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  William H. Sackley Consumer Confidence Surveys: Do They Boost Forecasters' Confidence? , 2003 .

[6]  C. Granger,et al.  Co-integration and error correction: representation, estimation and testing , 1987 .

[7]  R. Groves Three Eras of Survey Research , 2011 .

[8]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[9]  Joyce Neroni,et al.  Twitter as a potential data source for statistics , 2012 .

[10]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[11]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[12]  Nello Cristianini,et al.  Nowcasting the mood of the nation , 2012, Significance.

[13]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[14]  Edwin de Jonge,et al.  Visualizing and Inspecting Large Datasets with Tableplots , 2013, Journal of Data Science.

[15]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[16]  Rob Kitchin What does big data mean for official statistics , 2015 .

[17]  Cynthia Rudin,et al.  Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society , 2014 .

[18]  Benjamin Fry,et al.  Visualizing data - exploring and explaining data with the processing environment , 2008 .

[19]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[20]  Eszter Hargittai,et al.  Internet Access and Use in Context , 2004, New Media Soc..

[21]  C. Lynch Big data: How do your data grow? , 2008, Nature.

[22]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[23]  Iryna Gurevych,et al.  Can We Hide in the Web? Large Scale Simultaneous Age and Gender Author Profiling in Social Media Notebook for PAN at CLEF 2013 , 2013, CLEF.

[24]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[25]  Robert J. Elliott,et al.  Discrete time filters for doubly stochastic poisson processes and other exponential noise models , 1999 .

[26]  Emmanuel Sirimal Silva,et al.  Data Mining and Official Statistics: The Past, the Present and the Future , 2014, Big Data.

[27]  Piet Daas,et al.  Shifting paradigms in official statistics , 2012 .