On discovering co-location patterns in datasets: a case study of pollutants and child cancers

We intend to identify relationships between cancer cases and pollutant emissions by proposing a novel co-location mining algorithm. In this context, we specifically attempt to understand whether there is a relationship between the location of a child diagnosed with cancer with any chemical combinations emitted from various facilities in that particular location. Co-location pattern mining intends to detect sets of spatial features frequently located in close proximity to each other. Most of the previous works in this domain are based on transaction-free apriori-like algorithms which are dependent on user-defined thresholds, and are designed for boolean data points. Due to the absence of a clear notion of transactions, it is nontrivial to use association rule mining techniques to tackle the co-location mining problem. Our proposed approach is focused on a grid based transactionization? of the geographic space, and is designed to mine datasets with extended spatial objects. It is also capable of incorporating uncertainty of the existence of features to model real world scenarios more accurately. We eliminate the necessity of using a global threshold by introducing a statistical test to validate the significance of candidate co-location patterns and rules. Experiments on both synthetic and real datasets reveal that our algorithm can detect a considerable amount of statistically significant co-location patterns. In addition, we explain the data modelling framework which is used on real datasets of pollutants (PRTR/NPRI) and childhood cancer cases.

[1]  Ickjai Lee,et al.  Data Mining Techniques for Autonomous Exploration of Large Volumes of Geo-referenced Crime Data , 2001 .

[2]  Shashi Shekhar,et al.  A Joinless Approach for Mining Spatial Colocation Patterns , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[4]  Ghazi Al-Naymat,et al.  Enumeration of maximal clique for mining spatial co-location patterns , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[5]  Shashi Shekhar,et al.  A partial join approach for mining co-location patterns , 2004, GIS '04.

[6]  Fredrik Nyberg,et al.  Contribution of environmental factors to cancer risk. , 2003, British medical bulletin.

[7]  Younghee Kim,et al.  Maximal Cliques Generating Algorithm for Spatial Co-location Pattern Mining , 2011, STA.

[8]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[9]  Hans-Peter Kriegel,et al.  Algorithms and Applications for Spatial Data Mining , 2001 .

[10]  Richard Doll,et al.  Environmental factors and cancer incidence and mortality in different countries, with special reference to dietary practices , 1975, International journal of cancer.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Xing Xie,et al.  Density based co-location pattern discovery , 2008, GIS '08.

[13]  Gavin M. Mudd,et al.  Canadian Power Stations and the National Pollutant Release Inventory (NPRI): A Success Story for Pollution Intensity? , 2014, Water, Air, & Soil Pollution.

[14]  Osmar R. Zaïane,et al.  Associative Classification with Statistically Significant Positive and Negative Rules , 2015, CIKM.

[15]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[16]  Vladimir Estivill-Castro,et al.  Discovering Associations in Spatial Data - An Efficient Medoid Based Approach , 1998, PAKDD.

[17]  Jörg Sander,et al.  SSCP: Mining Statistically Significant Co-location Patterns , 2011, SSTD.

[18]  Yan Huang,et al.  Discovering Spatial Co-location Patterns: A Summary of Results , 2001, SSTD.

[19]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[20]  Yan Huang,et al.  On the Relationships between Clustering and Spatial Co-location Pattern Mining , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[21]  Arthur Getis,et al.  The Expected Proportion of a Region Polluted, by k Sources* , 2010 .

[22]  Osmar R. Zaïane,et al.  Negative Association Rules , 2014, Frequent Pattern Mining.

[23]  Osmar R. Zaïane,et al.  Discovering Co-location Patterns in Datasets with Extended Spatial Objects , 2013, DaWaK.

[24]  Achim J. Lilienthal,et al.  Using local wind information for gas distribution mapping in outdoor environments with a mobile robot , 2009, 2009 IEEE Sensors.

[25]  Nupur Bhatnagar Spatial Data Mining , 2006 .

[26]  Osmar R. Zaïane,et al.  Discovering Statistically Significant Co-location Rules in Datasets with Extended Spatial Objects , 2014, DaWaK.

[27]  Ben Kao,et al.  A Decremental Approach for Mining Frequent Itemsets from Uncertain Data , 2008, PAKDD.

[28]  Hui Xiong,et al.  Mining Co-Location Patterns with Rare Events from Spatial Data Sets , 2006, GeoInformatica.

[29]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[30]  Hui Xiong,et al.  A Framework for Discovering Co-Location Patterns in Data Sets with Extended Spatial Objects , 2004, SDM.

[31]  Yue-Hong Chou,et al.  Exploring spatial analysis in geographic information systems , 1997 .

[32]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[33]  Yasuhiko Morimoto,et al.  Mining frequent neighboring class sets in spatial databases , 2001, KDD '01.

[34]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[35]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[36]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.