Redundancy Reduction for Prevalent Co-Location Patterns

Spatial co-location pattern mining is an interesting and important task in spatial data mining which discovers the subsets of spatial features frequently observed together in nearby geographic space. However, the traditional framework of mining prevalent co-location patterns produces numerous redundant co-location patterns, which makes it hard for users to understand or apply. To address this issue, in this paper, we study the problem of reducing redundancy in a collection of prevalent co-location patterns by utilizing the spatial distribution information of co-location instances. We first introduce the concept of <italic>semantic distance</italic> between a co-location pattern and its super-patterns, and then define redundant co-locations by introducing the concept of <italic> δ-covered</italic>, where <inline-formula><tex-math notation="LaTeX">$\delta \,(0\leq \delta \leq 1)$</tex-math> <alternatives><inline-graphic xlink:href="wang-ieq1-2759110.gif"/></alternatives></inline-formula> is a coverage measure. We develop two algorithms <italic>RRclosed</italic> and <italic>RRnull</italic> to perform the redundancy reduction for prevalent co-location patterns. The former adopts the <italic>post-mining</italic> framework that is commonly used by existing redundancy reduction techniques, while the latter employs the <italic>mine-and-reduce </italic> framework that pushes redundancy reduction into the co-location mining process. Our performance studies on the synthetic and real-world data sets demonstrate that our method effectively reduces the size of the original collection of closed co-location patterns by about 50 percent. Furthermore, the RRnull method runs much faster than the related closed co-location pattern mining algorithm.

[1]  Lizhen Wang,et al.  Incremental mining of high utility co-locations from spatial database , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[2]  Shashi Shekhar,et al.  Zonal Co-location Pattern Discovery with Dynamic Parameters , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[3]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Lizhen Wang,et al.  Top-k probabilistic prevalent co-location mining in spatially uncertain data sets , 2015, Frontiers of Computer Science.

[5]  J. V. Du Toit,et al.  Generalized Additive Models from a Neural Network Perspective , 2007 .

[6]  Jiawei Han,et al.  Discovering interesting patterns through user's interactive feedback , 2006, KDD '06.

[7]  Ling Peng,et al.  A co-location pattern-mining algorithm with a density-weighted distance thresholding consideration , 2017, Inf. Sci..

[8]  Wenhao Yu,et al.  Spatial co-location pattern mining for location-based services in road networks , 2016, Expert Syst. Appl..

[9]  Lizhen Wang,et al.  Efficiently Mining High Utility Co-location Patterns from Spatial Data Sets with Instance-Specific Utilities , 2017, DASFAA.

[10]  Shashi Shekhar,et al.  A Joinless Approach for Mining Spatial Colocation Patterns , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[12]  Ghazi Al-Naymat,et al.  Fast Mining of Complex Spatial Co-location Patterns Using GLIMIT , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[13]  Hao Huang,et al.  Mining regional co-location patterns with kNNG , 2013, Journal of Intelligent Information Systems.

[14]  Joan Lu,et al.  An order-clique-based approach for mining maximal co-locations , 2009, Inf. Sci..

[15]  Lizhen Wang,et al.  Finding Probabilistic Prevalent Colocations in Spatially Uncertain Data Sets , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Farhad Samadzadegan,et al.  A generic regional spatio-temporal co-occurrence pattern mining model: a case study for air pollution , 2015, J. Geogr. Syst..

[17]  S. Shekhar,et al.  A Join-less Approach for Mining Spatial Co-location Patterns , 2006 .

[18]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[19]  Jiawei Han,et al.  Summarizing itemset patterns: a profile-based approach , 2005, KDD '05.

[20]  Shashi Shekhar,et al.  A neighborhood graph based approach to regional co-location pattern discovery: a summary of results , 2011, GIS.

[21]  Jin Soung Yoo,et al.  Mining Maximal Co-located Event Sets , 2011, PAKDD.

[22]  Jörg Sander,et al.  Mining Statistically Significant Co-location and Segregation Patterns , 2014, IEEE Transactions on Knowledge and Data Engineering.

[23]  Jin Soung Yoo,et al.  Mining top-k closed co-location patterns , 2011, Proceedings 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services.

[24]  Lizhen Wang,et al.  Spatial Co-Location Pattern Discovery from Fuzzy Objects , 2017, Int. J. Artif. Intell. Tools.

[25]  Heikki Mannila,et al.  The Pattern Ordering Problem , 2003, PKDD.

[26]  Osmar R. Zaïane,et al.  On discovering co-location patterns in datasets: a case study of pollutants and child cancers , 2014, GeoInformatica.

[27]  Michael Gertz,et al.  Spatial Interestingness Measures for Co-location Pattern Mining , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[28]  Lizhen Wang,et al.  Mining Competitive Pairs Hidden in Co-location Patterns from Dynamic Spatial Databases , 2017, PAKDD.