Big Data, Official Statistics and Some Initiatives by the Australian Bureau of Statistics

type="main" xml:id="insr12105-abs-0001"> Official statisticians have been dealing with a diversity of data sources for decades. However, new sources of data in the Big Data domain provide an opportunity to deliver a more efficient and effective statistical service. This paper outlines a number of considerations for the official statistician when deciding whether to embrace a particular new data source in the regular production of official statistics. The principal considerations are relevance, business benefit, and the validity of using the source for official statistics in finite population inferences or analytic inferences. The paper also describes the Big Data Flagship Project of the Australian Bureau of Statistics (ABS), which has been established to provide the opportunity for the ABS to gain practical experience in assessing the business, statistical, technical, computational and other issues in using Big Data. In addition, ABS participation in national and international activities in this area will help it share experience and knowledge, while collaboration with academics will enable ABS to better acquire the capability to address business problems using the new sources of data as part of the solution.

[1]  M. Couper Is the sky falling? new technology, changing media, and the future of surveys , 2013 .

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  K. Crawford The Hidden Biases in Big Data , 2013 .

[4]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[5]  Terence O'Neill,et al.  Selection Bias in Binary Data from Volunteer Surveys , 2006 .

[6]  Gordon Brackstone,et al.  Managing data quality in a statistical agency , 2003 .

[7]  John A. Richards,et al.  Remote Sensing Digital Image Analysis , 1986 .

[8]  J. Booker,et al.  Discussion-•-— »-— — — — — , 1998 .

[9]  Piet J. H. Daas,et al.  Big Data as a Source of Statistical Information , 2014 .

[10]  Robert M. Groves,et al.  Responsive design for household surveys: tools for actively controlling survey errors and costs , 2006 .

[11]  T. M. F. Smith,et al.  On the validity of interferences from non-random samples , 1983 .

[12]  Calyampudi R. Rao,et al.  Linear statistical inference and its applications , 1965 .

[13]  Stephan J. Maas,et al.  Remote sensing and crop production models: present trends , 1992 .

[14]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[15]  S. M. Tam,et al.  Analysis of Repeated Surveys Using a Dynamic Linear Model , 1987 .

[16]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[17]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[18]  The case for an international statistical innovation program – Transforming national and international statistics systems , 2009 .

[19]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[20]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[21]  L. M. Berliner,et al.  Hierarchical Bayesian space-time models , 1998, Environmental and Ecological Statistics.

[22]  Søren Feodor Nielsen,et al.  Inference and Missing Data: Asymptotic Results , 1997 .

[23]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[24]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[25]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[26]  J. Heckman Sample selection bias as a specification error , 1979 .