A K-Main Routes Approach to Spatial Network Activity Summarization

Data summarization is an important concept in data mining for finding a compact representation of a dataset. In spatial network activity summarization (SNAS), we are given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports) and the goal is to find k shortest paths that summarize the activities. SNAS is important for applications where observations occur along linear paths such as roadways, train tracks, etc. SNAS is computationally challenging because of the large number of k subsets of shortest paths in a spatial network. Previous work has focused on either geometry or subgraph-based approaches (e.g., only one path), and cannot summarize activities using multiple paths. This paper proposes a K-Main Routes (KMR) approach that discovers k shortest paths to summarize activities. KMR generalizes K-means for network space but uses shortest paths instead of ellipses to summarize activities. To improve performance, KMR uses network Voronoi, divide and conquer, and pruning strategies. We present a case study comparing KMR's network-based output (i.e., shortest paths) to geometry-based outputs (e.g., ellipses) on pedestrian fatality data. Experimental results on synthetic and real data show that KMR with our performance-tuning decisions yields substantial computational savings without reducing summary path coverage.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Isabelle Thomas,et al.  Spatial clustering of traffic accidents using distances along the network , 2006 .

[3]  Narushige Shiode,et al.  Detection of multi‐scale clusters in network space , 2009, Int. J. Geogr. Inf. Sci..

[4]  R. D'Andrade U-statistic hierarchical clustering , 1978 .

[5]  Charles T. Driscoll,et al.  Electron budgets for the hypolimnion of a recovering urban lake, 1989‐2004: Response to changes in organic carbon deposition and availability of electron acceptors , 2008 .

[6]  S. A. Roach,et al.  The Theory of Random Clumping , 1968 .

[7]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[8]  Shashi Shekhar,et al.  CCAM: A Connectivity-Clustered Access Method for Networks and Network Computations , 1997, IEEE Trans. Knowl. Data Eng..

[9]  Jae-Gil Lee,et al.  Traffic Density-Based Discovery of Hot Routes in Road Networks , 2007, SSTD.

[10]  S. Chainey,et al.  Mapping Crime: Understanding Hot Spots , 2014 .

[11]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[12]  Shashi Shekhar,et al.  Discovering and Quantifying Mean Streets : A Summary of Results ∗ , 2007 .

[13]  Jean-Claude Thill,et al.  Local Indicators of Network-Constrained Clusters in Spatial Point Patterns , 2007 .

[14]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[16]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[17]  Shashi Shekhar,et al.  Identifying patterns in spatial information: A survey of methods , 2011, WIREs Data Mining Knowl. Discov..

[18]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[19]  Isabelle Thomas,et al.  Intra-urban location and clustering of road accidents using GIS: a Belgian example , 2004, Int. J. Geogr. Inf. Sci..

[20]  Atsuyuki Okabe,et al.  Spatial analysis of roadside Acacia populations on a road network using the network K-function , 2004, Landscape Ecology.

[21]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[22]  Yasushi Kiyoki,et al.  A pillar algorithm for K-means optimization by distance maximization for initial centroid designation , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[23]  Bettina Speckmann,et al.  Detecting Hotspots in Geographic Networks , 2009, AGILE Conf..

[24]  D. Hochbaum Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems , 1996 .

[25]  Mauricio G. C. Resende,et al.  A Hybrid Heuristic for the p-Median Problem , 2004, J. Heuristics.

[26]  Atsuyuki Okabe,et al.  The SANET Toolbox: New Methods for Network Spatial Analysis , 2006, Trans. GIS.

[27]  Samarjeet Borah,et al.  Performance Analysis of AIM-K-means & K-means in Quality Cluster Generation , 2009, ArXiv.

[28]  Kyriakos Mouratidis,et al.  Constrained Shortest Path Computation , 2005, SSTD.

[29]  W. Carter,et al.  Disaster management: A disaster manager's handbook , 1991 .