Outlier Detection and Comparison of Origin-Destination Flows Using Data Depth

Abstract Advances in location-aware technology have resulted in massive trajectory data. Origin-destination (OD) trajectories provide rich information on urban flow and transport demand. This study describes a new method for detecting OD flows outliers and conducting hypothesis testing between two OD flow datasets in terms of the variations of spatial extent, that is, spread. The proposed method is based on data depth, which measures the centrality and outlyingness of a point with respect to a given dataset in R. Based on the center-outward ordering property, the proposed method analyzes the underlying characteristics of OD flows, such as location, outlyingness, and spread. The ability of the method to detect OD anomalies is compared with that of the Mahalanobis distance approach, and an F-test is used to verify the difference in scale. Empirical evaluation has demonstrated that our method effectively identifies OD flows outliers in an interactive way. Furthermore, the method can provide new perspectives such as spatial extent by considering the overall structure of data when comparing two different OD flows in terms of scale.

[1]  Guan Yuan,et al.  Trajectory Outlier Detection Algorithm Based on Structural Features , 2011 .

[2]  Tatjana Lange,et al.  Fast nonparametric classification based on data depth , 2012, ArXiv.

[3]  Zhi-Hua Zhou,et al.  B-Planner: Night bus route planning using large-scale taxi GPS traces , 2013, 2013 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[4]  Shaowen Wang,et al.  A multidimensional spatial scan statistics approach to movement pattern comparison , 2018, Int. J. Geogr. Inf. Sci..

[5]  Rand R. Wilcox,et al.  Approximating Tukey's Depth , 2003 .

[6]  Regina Y. Liu On a Notion of Data Depth Based on Random Simplices , 1990 .

[7]  Yaping Cai,et al.  Data depth based clustering analysis , 2016, SIGSPATIAL/GIS.

[8]  M. Alberti,et al.  Integrating Humans into Ecology: Opportunities and Challenges for Studying Urban Ecosystems , 2003 .

[9]  Jeremy Miles,et al.  Discovering statistics using R, 1st Edition , 2012 .

[10]  P. Rousseeuw,et al.  The Bagplot: A Bivariate Boxplot , 1999 .

[11]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[12]  Guang Yang,et al.  Trajectory Outlier Detection Based on Multi-Factors , 2014, IEICE Trans. Inf. Syst..

[13]  R. Serfling,et al.  General notions of statistical depth function , 2000 .

[14]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Vania Bogorny,et al.  Towards Semantic Trajectory Outlier Detection , 2013, GEOINFO.

[16]  Diansheng Guo,et al.  Origin-Destination Flow Data Smoothing and Mapping , 2014, IEEE Transactions on Visualization and Computer Graphics.

[17]  Ran Tao,et al.  Spatial Cluster Detection in Spatial Flow Data , 2016 .

[18]  Bettina Speckmann,et al.  Similarity of trajectories taking into account geographic context , 2014, J. Spatial Inf. Sci..

[19]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[20]  P. Rousseeuw,et al.  Bivariate location depth , 1996 .

[21]  Bettina Speckmann,et al.  Modeling Checkpoint-Based Movement with the Earth Mover's Distance , 2016, GIScience.

[22]  Cyrus Shahabi,et al.  Crowd sensing of traffic anomalies based on human mobility and social media , 2013, SIGSPATIAL/GIS.

[23]  Zhi-Hua Zhou,et al.  iBAT: detecting anomalous taxi trajectories from GPS traces , 2011, UbiComp '11.

[24]  Shaojie Qiao,et al.  An efficient outlying trajectories mining approach based on relative distance , 2012, Int. J. Geogr. Inf. Sci..

[25]  Sabine Timpf,et al.  Trajectory data mining: A review of methods and applications , 2016, J. Spatial Inf. Sci..

[26]  Shaowen Wang,et al.  Depicting urban boundaries from a mobility network of spatial interactions: a case study of Great Britain with geo-located Twitter data , 2017, Int. J. Geogr. Inf. Sci..

[27]  Rand R. Wilcox Two-Sample, Bivariate Hypothesis Testing Methods Based on Tukey's Depth , 2003 .

[28]  Regina Y. Liu,et al.  A Quality Index Based on Data Depth and Multivariate Rank Tests , 1993 .

[29]  Sabine Timpf,et al.  Exploring the Potential of Combining Taxi GPS and Flickr Data for Discovering Functional Regions , 2015, AGILE Conf..

[30]  M. Kwan Space-time and integral measures of individual accessibility: a comparative analysis using a point-based framework , 2010 .

[31]  Shaowen Wang,et al.  Exploring Multi-Scale Spatiotemporal Twitter User Mobility Patterns with a Visual-Analytics Approach , 2016, ISPRS Int. J. Geo Inf..