Creating a surrogate commuter network from Australian Bureau of Statistics census data

Between the 2011 and 2016 national censuses, the Australian Bureau of Statistics changed its anonymity policy compliance system for the distribution of census data. The new method has resulted in dramatic inconsistencies when comparing low-resolution data to aggregated high-resolution data. Hence, aggregated totals do not match true totals, and the mismatch gets worse as the data resolution gets finer. Here, we address several aspects of this inconsistency with respect to the 2016 usual-residence to place-of-work travel data. We introduce a re-sampling system that rectifies many of the artifacts introduced by the new ABS protocol, ensuring a higher level of consistency across partition sizes. We offer a surrogate high-resolution 2016 commuter dataset that reduces the difference between the aggregated and true commuter totals from ~34% to only ~7%, which is on the order of the discrepancy across partition resolutions in data from earlier years. Design Type(s)modeling and simulation objective • network analysis objective • data validation objectiveMeasurement Type(s)population dataTechnology Type(s)computational modeling techniqueFactor Type(s)geographic locationSample Characteristic(s)Australia • anthropogenic habitat Machine-accessible metadata file describing the reported data (ISA-Tab format)

[1]  Oliver M. Cliff,et al.  Urbanization affects peak timing, prevalence, and bimodality of influenza pandemics in Australia: Results of a census-calibrated model , 2018, Science Advances.

[2]  Aravind Srinivasan,et al.  Modelling disease outbreaks in realistic urban social networks , 2004, Nature.

[3]  Albert Y. Zomaya,et al.  Information-Cloning of Scale-Free Networks , 2007, ECAL.

[4]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[5]  Bruce Fraser,et al.  A PROPOSED METHOD FOR CONFIDENTIALISING TABULAR OUTPUT TO PROTECT AGAINST DIFFERENCING , 2006 .

[6]  A. Nizam,et al.  Containing Pandemic Influenza at the Source , 2005, Science.

[7]  Mikhail Prokopenko,et al.  Investigating Spatiotemporal Dynamics and Synchrony of Influenza Epidemics in Australia: An Agent-Based Modelling Approach , 2018, Simul. Model. Pract. Theory.

[8]  Stephen Clarence. Mitigation strategies for pandemic influenza. , 2012 .

[9]  D. Foley,et al.  The economy needs agent-based modelling , 2009, Nature.

[10]  K. Kaski,et al.  Intensity and coherence of motifs in weighted complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Dawei Zhao,et al.  Statistical physics of vaccination , 2016, ArXiv.

[12]  Michael K. Reiter,et al.  The Challenges of Effectively Anonymizing Network Data , 2009, 2009 Cybersecurity Applications & Technology Conference for Homeland Security.

[13]  Albert Y. Zomaya,et al.  Assortativeness and information in scale-free networks , 2009 .

[14]  Jonathan Levin,et al.  Economics in the age of big data , 2014, Science.

[15]  L. Ryan,et al.  Sufficiency Revisited: Rethinking Statistical Algorithms in the Big Data Era , 2017 .

[16]  Maurizio Ribera d’Alcalà,et al.  Ecological-network models link diversity, structure and function in the plankton food-web , 2016, Scientific Reports.

[17]  C. Macken,et al.  Mitigation strategies for pandemic influenza in the United States. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[18]  James W Jawitz,et al.  High-resolution reconstruction of the United States human population distribution, 1790 to 2010 , 2018, Scientific Data.

[19]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  J Daniel Rogers,et al.  Opinion: Building a better past with the help of agent-based modeling , 2017, Proceedings of the National Academy of Sciences.

[21]  Catherine A. Fitch,et al.  Interoperable and accessible census and survey data from IPUMS , 2018, Scientific Data.

[22]  Mikhail Prokopenko,et al.  Thermodynamic efficiency of contagions: a statistical mechanical analysis of the SIS epidemic model , 2018, Interface Focus.

[23]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).