Big data and semantic technology: A future for data integration, exploration and visualisation

In a world of ever increasing data availability and user expectations, National Statistical Offices face mounting chal- lenges to produce relevant and timely statistics. They need to transform their business practice to take advantage of big data - especially administrative data - by integrating non-traditional and survey data sources to maximise value, and utilising new technology to enable enhanced analysis. An example of a response to these challenges is the prototype GLIDE (Graphically Linked Information Discovery Environment) the Australian Bureau of Statistics (ABS) is currently developing using semantic web technology. This environment includes as a test case a prototype semantic linked employer-employee database (LEED) which integrates administrative tax data and ABS business register data to enable detailed microeconomic analysis. However, as data structures become more complex and multi-dimensional, data integration and exploration encounters challenges within traditional relational databases, prompting the exploration of alternatives. Semantic web technology allows for a flexible data structure, machine reasoning and inference on the dataset, a shared understanding of the data's meaning, reusable classifications and standards, easy exploration of many dimensions, and network analysis. The possible advantages of such an approach for official statistics are demonstrated through two practical examples, showing how the prototype GLIDE supports effective data exploration and visualisation, and enables network analysis, to solve real business problems.

[1]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[2]  Svein Nordbotten The use of administrative data in official statistics - past, present and future : with special reference to the Nordic countries , 2010 .

[3]  Paolo Bouquet,et al.  An Identification Ontology for Entity Matching , 2014, OTM Workshops.

[4]  Vasilis Efthymiou,et al.  Entity resolution in the web of data , 2013, Entity Resolution in the Web of Data.

[5]  Chien-Hung Chien,et al.  Connectedness and Meaning: New Analytical Directions for Official Statistics , 2015, SemStats@ISWC.

[6]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[7]  A. Seyb The Longitudinal Business Frame , 2003 .

[8]  Samantha Bail,et al.  FishMark: A Linked Data Application Benchmark , 2012, SSWS+HPCSW@ISWC.

[9]  The case for an international statistical innovation program – Transforming national and international statistics systems , 2009 .

[10]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[11]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[12]  T. Snijders Statistical Models for Social Networks , 2011 .

[13]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[14]  S. M. Tam,et al.  Big Data, Official Statistics and Some Initiatives by the Australian Bureau of Statistics , 2015 .

[15]  VassilisChristophides,et al.  Entity Resolution in the Web of Data , 2015 .

[16]  Widening the data net: NSO leadership role , 2012 .

[17]  Anders Wallgren,et al.  Register-based statistics : , 2007 .

[18]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[19]  Andrew Crapo,et al.  Semantics: Revolutionary Breakthrough or Just Another Way of Doing Things? , 2016 .

[20]  Bryce E. Stephens,et al.  Integrated Employer-Employee Data: New Resources for Regional Data Analysis , 2006 .

[21]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[22]  Panagiotis G. Ipeirotis,et al.  Duplicate Record Detection: A Survey , 2007 .

[23]  J. Abowd Unlocking the information in integrated social data , 2002 .

[24]  Mayank Kejriwal,et al.  Populating Entity Name Systems for Big Data Integration , 2014, SEMWEB.

[25]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[26]  François Scharffe,et al.  Data Linking for the Semantic Web , 2011, Int. J. Semantic Web Inf. Syst..

[27]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[28]  R. Doyle The American terrorist. , 2001, Scientific American.

[29]  Giorgio Orsi,et al.  The relational model is dead, SQL is dead, and I don't feel so good myself , 2013, SGMD.

[30]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.