Mapping the geodemographics of digital inequality in Great Britain: An integration of machine learning into small area estimation

Abstract Geographic variation in digital inequality manifests as a result of a range of demographic, attitudinal, behavioural and locational factors. To better understand this multidimensional geography, our paper develops a new geodemographic classification for the spatial extent of Great Britain. In this model, we integrate a range of new small area measures that are drawn from multiple new forms of data including consumer purchasing data, survey and open data sources. Our analytical approach innovatively provides an integration of machine learning into a small-area estimation technique to obtain Lower Super Output Area / Data Zone estimates of Internet use, alongside a range of online engagement and consumption measures. Following the collation of a range of input measures, we implemented a more standard geodemographic framework that utilises the unsupervised clustering algorithm k-means to produce a map of the multidimensional characteristics of digital inequality for Great Britain; creating the Internet User Classification (IUC). Our outputs provide a new and nuanced understanding of the contemporary salient characteristics of digital inequality in Great Britain, which we evaluate both internally and externally within the context of preparations for the 2021 UK Census of the Population, exploring the geodemographic patterns of Census test response rates and the prevalence to complete the survey online. Our innovative work illustrates the strength of a geodemographic approach in mapping spatial patterns of digital inequality, and through the presented application concerning Census response rates and characteristics we demonstrate how the IUC can be operationalised within such settings for local intervention or benchmarking.

[1]  Alison J. Heppenstall,et al.  A geodemographic classification of sub-districts to identify education inequality in Central Beijing , 2018, Comput. Environ. Urban Syst..

[2]  R. Berk,et al.  Small Area Estimation of the Homeless in Los Angeles: An Application of Cost-Sensitive stochastic Gradient Boosting , 2010, 1011.2890.

[3]  Paul A. Longley,et al.  Creating the 2011 area classification for output areas (2011 OAC) , 2016, J. Spatial Inf. Sci..

[4]  S. Spielman,et al.  Studying Neighborhoods Using Uncertain Data from the American Community Survey: A Contextual Approach , 2015 .

[5]  S. Lissitsa,et al.  Generation X vs. Generation Y – A decade of online shopping , 2016 .

[6]  Constantine E. Kontokosta,et al.  Using machine learning and small area estimation to predict building-level municipal solid waste generation in cities , 2018, Comput. Environ. Urban Syst..

[7]  P. Longley Geographical Information Systems: a renaissance of geodemographics for public service delivery , 2005 .

[8]  Jan van Dijk,et al.  The Digital Divide as a Complex and Dynamic Phenomenon , 2000, Inf. Soc..

[9]  M. Szeles,et al.  New insights from a multilevel approach to the regional digital divide in the European Union , 2018, Telecommunications Policy.

[10]  Alex Singleton,et al.  Linking Social Deprivation and Digital Exclusion in England , 2009 .

[11]  Gang Peng,et al.  Do computer skills affect worker employment? An empirical study from CPS surveys , 2017, Comput. Hum. Behav..

[12]  Vasja Vehovar,et al.  Methodological Challenges of Digital Divide Measurements , 2006, Inf. Soc..

[13]  A. Gonzales The contemporary US digital divide: from initial access to technology maintenance , 2016 .

[14]  Stephen A. Rains,et al.  Smartphone Internet access and use: Extending the digital divide and usage gap , 2017 .

[15]  Azizur Rahman,et al.  Methodological Issues in Spatial Microsimulation Modelling for Small Area Estimation , 2009 .

[16]  Alex Singleton,et al.  Geodemographics, visualisation, and social networks in applied geography , 2009 .

[17]  Janet Chang,et al.  Correlates of, and Barriers to, Internet Use Among Older Adults , 2015, Journal of gerontological social work.

[18]  Paul Norman,et al.  Estimating Population Attribute Values in a Table: “Get Me Started in” Iterative Proportional Fitting , 2016 .

[19]  T. Makkonen,et al.  Variations in the adoption and willingness to use e-services in three differentiated urban areas , 2018 .

[20]  S. Katikireddi,et al.  Assessing the potential utility of commercial ‘big data’ for health research: Enhancing small-area deprivation measures with Experian™ Mosaic groups , 2019, Health & place.

[21]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[22]  Robin Lovelace,et al.  Spatial Microsimulation with R , 2016, Chapman & Hall/CRC the R series.

[23]  Adam Whitworth,et al.  Estimating uncertainty in spatial microsimulation approaches to small area estimation: A new approach to solving an old problem , 2017, Comput. Environ. Urban Syst..

[24]  Alex Singleton,et al.  Measuring the spatial vulnerability of retail centres to online consumption through a framework of e-resilience , 2016 .

[25]  Craig M. Dalton,et al.  Inflated granularity: Spatial “Big Data” and geodemographics , 2015, Big Data Soc..

[26]  Malay Ghosh,et al.  Small Area Estimation: An Appraisal , 1994 .

[27]  Zhaohui Zheng,et al.  Stochastic gradient boosted distributed decision trees , 2009, CIKM.

[28]  P. Rees,et al.  Creating the UK National Statistics 2001 output area classification , 2007 .

[29]  Matthias Schonlau,et al.  The Clustergram: A Graph for Visualizing Hierarchical and Nonhierarchical Cluster Analyses , 2002 .

[30]  J. Reynolds,et al.  Beyond retail: New ways of classifying UK shopping and consumption spaces , 2019, Environment and Planning B: Urban Analytics and City Science.

[31]  Liam Smith,et al.  Digital inclusion and online behaviour: five typologies of Australian internet users , 2018, Behav. Inf. Technol..

[32]  Roger Burrows,et al.  The Predictive Postcode: The Geodemographic Classification of British Society , 2018 .

[33]  Mark Graham,et al.  Local Geographies of Digital Inequality , 2018 .

[34]  N. Ellison,et al.  Mapping the two levels of digital divide: Internet access and social network site adoption among older adults in the USA , 2016 .

[35]  Alex Singleton,et al.  The internal structure of Greater London: a comparison of national and regional geodemographic models , 2015 .

[36]  Laura Rooney,et al.  A Digital NHS: An Introduction to the Digital Agenda and Plans for Implementation , 2016 .

[37]  Peter Bragge,et al.  Digital Inclusion & Health Communication: A Rapid Review of Literature , 2018, Health communication.

[38]  Will Marler,et al.  Mobile phones and inequality: Findings, trends, and future directions , 2018, New Media Soc..

[39]  Seth E. Spielman,et al.  The Past, Present and Future of Geodemographic Research in the United States and United Kingdom , 2013, The Professional geographer : the journal of the Association of American Geographers.

[40]  Natascha Just,et al.  Modeling the second-level digital divide: A five-country study of social differences in Internet use , 2016, New Media Soc..

[41]  Thomas N Friemel,et al.  The digital divide has grown old: Determinants of a digital divide among seniors , 2016, New Media Soc..

[42]  P. Longley,et al.  Data infrastructure requirements for new geodemographic classifications: The example of London's workplace zones , 2019, Applied Geography.

[43]  G. Moon,et al.  The utility of geodemographic indicators in small area estimates of limiting long-term illness. , 2019, Social science & medicine.

[44]  Karine Barzilai-Nahon,et al.  Gaps and Bits: Conceptualizing Measurements for Digital Divide/s , 2006 .

[45]  Richard Webber,et al.  Geodemographics, GIS and Neighbourhood Targeting , 2005 .

[46]  Gustavo S. Mesch Ethnic origin and access to electronic health services , 2016, Health Informatics J..

[47]  D. Timms,et al.  The urban mosaic : towards a theory of residential differentiation , 1971 .

[48]  Seth Guikema,et al.  Methods for Estimating Population Density in Data-Limited Areas: Evaluating Regression and Tree-Based Models in Peru , 2014, PloS one.

[49]  T. Grubesic,et al.  Closing information asymmetries: A scale agnostic approach for exploring equity implications of broadband provision , 2019, Telecommunications Policy.

[50]  Eszter Hargittai,et al.  A review of Internet use among older adults , 2018, New Media Soc..

[51]  Alex Singleton,et al.  A Classification of Multidimensional Open Data for Urban Morphology , 2016 .