A Parallel Spatial Co-location Mining Algorithm Based on MapReduce

Spatial association rule mining is a useful tool for discovering correlations and interesting relationships among spatial objects. Co-locations, or sets of spatial events which are frequently observed together in close proximity, are particularly useful for discovering their spatial dependencies. Although a number of spatial co-location mining algorithms have been developed, the computation of co-location pattern discovery remains prohibitively expensive with large data size and dense neighborhoods. We propose to leverage the power of parallel processing, in particular, the MapReduce framework to achieve higher spatial mining processing efficiency. MapReduce-like systems have been proven to be an efficient framework for large-scale data processing on clusters of commodity machines, and for big data analysis for many applications. The proposed parallel co-location mining algorithm was developed on MapReduce. The experimental result of the developed algorithm shows scalability in computational performance.

[1]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[2]  Eli Upfal,et al.  PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce , 2012, CIKM.

[3]  Ming-Yen Lin,et al.  Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.

[4]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[7]  Qing He,et al.  Parallel Implementation of Apriori Algorithm Based on MapReduce , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[8]  Xin Zhang,et al.  Fast mining of spatial collocations , 2004, KDD.

[9]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[10]  Jin Soung Yoo,et al.  Mining top-k closed co-location patterns , 2011, Proceedings 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services.

[11]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Christoph F. Eick,et al.  Towards Region Discovery in Spatial Datasets , 2008, PAKDD.

[13]  J. Moon,et al.  On cliques in graphs , 1965 .

[14]  Mong-Li Lee,et al.  A framework for mining topological patterns in spatio-temporal databases , 2005, CIKM '05.

[15]  Jimmy J. Lin,et al.  Pairwise Document Similarity in Large Collections with MapReduce , 2008, ACL.

[16]  Christoph F. Eick,et al.  Finding regional co-location patterns for sets of continuous variables in spatial datasets , 2008, GIS '08.

[17]  Min Zhang,et al.  The Strategy of Mining Association Rule Based on Cloud Computing , 2011, 2011 International Conference on Business Computing and Global Informatization.

[18]  Ranga Raju Vatsavai,et al.  Spatiotemporal data mining in the era of big spatial data: algorithms and applications , 2012, BigSpatial '12.

[19]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Yasuhiko Morimoto,et al.  Mining frequent neighboring class sets in spatial databases , 2001, KDD '01.

[22]  Hui Xiong,et al.  Mining Co-Location Patterns with Rare Events from Spatial Data Sets , 2006, GeoInformatica.

[23]  Jin Chang,et al.  Balanced parallel FP-Growth with MapReduce , 2010, 2010 IEEE Youth Conference on Information, Computing and Telecommunications.

[24]  Xing Xie,et al.  Density based co-location pattern discovery , 2008, GIS '08.

[25]  S. Shekhar,et al.  A Join-less Approach for Mining Spatial Co-location Patterns , 2006 .

[26]  Shashi Shekhar,et al.  Spatial Databases: A Tour , 2003 .

[27]  Jin Soung Yoo,et al.  Finding N-Most Prevalent Colocated Event Sets , 2009, DaWaK.

[28]  Shashi Shekhar,et al.  A Joinless Approach for Mining Spatial Colocation Patterns , 2006, IEEE Transactions on Knowledge and Data Engineering.

[29]  Jin Soung Yoo,et al.  Mining spatial colocation patterns: a different framework , 2011, Data Mining and Knowledge Discovery.

[30]  Sridhar Ramaswamy,et al.  Scalable Sweeping-Based Spatial Join , 1998, VLDB.

[31]  Eugene L. Lawler,et al.  Generating all Maximal Independent Sets: NP-Hardness and Polynomial-Time Algorithms , 1980, SIAM J. Comput..

[32]  Jin Soung Yoo,et al.  Mining Maximal Co-located Event Sets , 2011, PAKDD.

[33]  Shashi Shekhar,et al.  A partial join approach for mining co-location patterns , 2004, GIS '04.

[34]  Benjamin Moseley,et al.  Fast clustering using MapReduce , 2011, KDD.

[35]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[36]  Shashi Shekhar,et al.  A neighborhood graph based approach to regional co-location pattern discovery: a summary of results , 2011, GIS.

[37]  Yan Huang,et al.  Discovering Spatial Co-location Patterns: A Summary of Results , 2001, SSTD.