A simple and direct method to analyse the influences of sampling fractions on modelling intra-city human mobility

ABSTRACT Sampling fraction is crucial to sampling-related studies and applications, especially in the big data era when most data are neither originally designed nor controllable in the data collection process. A common concern among researchers is ‘what’s the modelling accuracy when using a sample?’. Taking intra-city human mobility as the study objective, this study utilizes a simple and direct method to analyse the influences of various sampling fractions on modelling accuracy. Five common intra-city human mobility indicators (travel distance, travel time, travel frequency, radius of gyration and movement entropy) are evaluated considering mean value, median and probability distribution. Experimental results demonstrate that the representativeness of each considered indicator converges to 1 in its own unique rate and variances. The minimum required sampling fractions to satisfy specific accuracies differ for various indicators and evaluation measures. To further investigate how related factors influence the modelling accuracy of sampling fractions, additional experiments are conducted considering multiple sampling methods, study scopes, and data sources. Several interesting general findings are observed. This study provides a reference for other sampling-based applications.

[1]  Satish V. Ukkusuri,et al.  Urban activity pattern classification using topic models from online geo-location data , 2014 .

[2]  A. Acharya,et al.  Sampling: why and how of it? , 2013 .

[3]  W. G. Madow On the Theory of Systematic Sampling, II , 1944 .

[4]  A. Chaudhuri,et al.  Survey sampling : theory and methods , 1992 .

[5]  Zaid Chalabi,et al.  Optimization of household survey sampling without sample frames. , 2006, International journal of epidemiology.

[6]  Randall Guensler,et al.  Elimination of the Travel Diary: Experiment to Derive Trip Purpose from Global Positioning System Travel Data , 2001 .

[7]  A. Berger FUNDAMENTALS OF BIOSTATISTICS , 1969 .

[8]  P. H. Diananda The central limit theorem for m-dependent variables , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[9]  Chenghu Zhou,et al.  A new insight into land use classification based on aggregated mobile phone data , 2013, Int. J. Geogr. Inf. Sci..

[10]  C. A. Moser,et al.  Quota Sampling , 2022, The SAGE Encyclopedia of Research Design.

[11]  G. Imbens,et al.  Efficient estimation and stratified sampling , 1996 .

[12]  W. Gesler,et al.  International Journal of Health Geographics a Suite of Methods for Representing Activity Space in a Healthcare Accessibility Study , 2022 .

[13]  Leanne M Aitken,et al.  Sample size: how many is enough? , 2012, Australian critical care : official journal of the Confederation of Australian Critical Care Nurses.

[14]  Surajit Chaudhuri,et al.  Optimized stratified sampling for approximate query processing , 2007, TODS.

[15]  Ling Yin,et al.  Understanding the bias of call detail records in human mobility research , 2016, Int. J. Geogr. Inf. Sci..

[16]  Markus Friedrich,et al.  Generating Origin–Destination Matrices from Mobile Phone Trajectories , 2010 .

[17]  Vadim V. Strijov,et al.  Sample size determination for logistic regression , 2014, J. Comput. Appl. Math..

[18]  Y Wang,et al.  Observation of an Antimatter Hypernucleus , 2010, Science.

[19]  Song Gao,et al.  Discovering Spatial Interaction Communities from Mobile Phone Data , 2013 .

[20]  Lester A Hoel,et al.  Traffic & Highway Engineering , 2009 .

[21]  Y. Kestens,et al.  Conceptualization and measurement of environmental exposure in epidemiology: accounting for activity space related to daily mobility. , 2013, Health & place.

[22]  N. Eagle,et al.  Network Diversity and Economic Development , 2010, Science.

[23]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[24]  H Robbins,et al.  Complete Convergence and the Law of Large Numbers. , 1947, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[26]  Ilker Etikan,et al.  Comparison of Convenience Sampling and Purposive Sampling , 2016 .

[27]  Sarjinder Singh Simple Random Sampling , 2003 .

[28]  Wei Tu,et al.  Coupling mobile phone and social media data: a new approach to understanding urban functions and diurnal patterns , 2017, Int. J. Geogr. Inf. Sci..

[29]  Andrzej Szarata,et al.  Analysis of Household Survey Sample Size in Trip Modelling Process , 2016 .

[30]  Bernard R. Rosner,et al.  Fundamentals of Biostatistics. , 1992 .

[31]  Wei Tu,et al.  Portraying Urban Functional Zones by Coupling Remote Sensing Imagery and Human Sensing Data , 2018, Remote. Sens..

[32]  Peter R. Stopher,et al.  Search for a global positioning system device to measure person travel , 2008 .

[33]  Fan Zhang,et al.  Exploring human mobility with multi-source data at extremely large metropolitan scales , 2014, MobiCom.

[34]  G. Seber,et al.  Adaptive Cluster Sampling , 2012 .

[35]  Qingquan Li,et al.  Exploring changes in the spatial distribution of the low-to-moderate income group using transit smart card data , 2018, Comput. Environ. Urban Syst..

[36]  Steven K. Thompson,et al.  Stratified adaptive cluster sampling , 1991 .

[37]  A L Finkner SAMPLE SURVEY DESIGN , 1970 .

[38]  Eugene Demidenko,et al.  Sample size determination for logistic regression revisited , 2006, Statistics in medicine.

[39]  Arnim H. Meyburg,et al.  SURVEY METHODS FOR TRANSPORT PLANNING , 1995 .

[40]  Akshay Vij,et al.  When is big data big enough? Implications of using GPS-based surveys for travel demand analysis , 2015 .

[41]  Robert M. Groves,et al.  Responsive design for household surveys: tools for actively controlling survey errors and costs , 2006 .

[42]  J. Berger,et al.  Training samples in objective Bayesian model selection , 2004, math/0406460.

[43]  Haoying Han,et al.  Evaluating the effectiveness of urban growth boundaries using human mobility and activity records , 2015 .

[44]  Fernando Pérez-Cruz,et al.  Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.

[45]  Andreas Karlsson,et al.  Elementary Survey Sampling , 2007, Technometrics.

[46]  W. G. Madow On the Theory of Systematic Sampling, III. Comparison of Centered and Random Start Systematic Sampling , 1953 .

[47]  Martin Raubal,et al.  Extracting Dynamic Urban Mobility Patterns from Mobile Phone Data , 2012, GIScience.