Sample Bias due to Missing Data in Mobility Surveys

A growing number of companies use mobility information in their day-to-day business. One requirement thereby is that inference about population-wide mobility patterns can be made. Therefore, it is not only important to find mobility patterns in a given data sample but also to assert their validity for the total population. This aspect of analysis has been largely neglected in mobility data mining research, which limits the applicability of the whole algorithmic field. In this paper we will analyze one aspect of sample bias due to incomplete mobility data. We will provide a systematic approach to detect dependencies between mobility behavior, socio-demography and missing data. Further, we apply the approach to a large GPS mobility survey in Switzerland and show that our concerns are justified and require attention in future research. We hope that our paper will raise the awareness that representativity of mobile behavior cannot be taken for granted in mobility surveys due to missing data and is a research direction of utmost importance.

[1]  Willi Klösgen,et al.  Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database , 2002, PKDD.

[2]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[3]  Dino Pedreschi,et al.  Visually driven analysis of movement data by progressive clustering , 2008, Inf. Vis..

[4]  C. Bhat,et al.  A Comparative Analysis of GPS-Based and Travel Survey-based Data , 2006 .

[5]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[6]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[7]  Stefan Wrobel,et al.  Tight Optimistic Estimates for Fast Subgroup Discovery , 2008, ECML/PKDD.

[8]  J. Zmud Identifying the Correlates of Trip Misreporting - Results from the California Statewide Household Travel Survey GPS Study , 2003 .

[9]  Bettina Speckmann,et al.  Efficient Detection of Patterns in 2D Trajectories of Moving Points , 2007, GeoInformatica.

[10]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[11]  Dino Pedreschi,et al.  Time-focused clustering of trajectories of moving objects , 2006, Journal of Intelligent Information Systems.

[12]  Patrick Laube,et al.  Analyzing Relative Motion within Groups of Trackable Moving Point Objects , 2002, GIScience.

[13]  Michael May,et al.  Handling missing values in GPS surveys using survival analysis: a GPS case study of outdoor advertising , 2009, KDD Workshop on Data Mining and Audience Intelligence for Advertising.

[14]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[15]  Michael May,et al.  Modelling and prospects of the audience measurement for outdoor advertising based on data collection using GPS devices (electronic passive measurement system) , 2008 .

[16]  C. Bhat,et al.  Comparative Analysis of Global Positioning System–Based and Travel Survey–Based Data: , 2006 .