PARM—An Efficient Algorithm to Mine Association Rules From Spatial Data

Association rule mining, originally proposed for market basket data, has potential applications in many areas. Spatial data, such as remote sensed imagery (RSI) data, is one of the promising application areas. Extracting interesting patterns and rules from spatial data sets, composed of images and associated ground data, can be of importance in precision agriculture, resource discovery, and other areas. However, in most cases, the sizes of the spatial data sets are too large to be mined in a reasonable amount of time using existing algorithms. In this paper, we propose an efficient approach to derive association rules from spatial data using Peano count tree (P-tree) structure. P-tree structure provides a lossless and compressed representation of spatial data. Based on P-trees, an efficient association rule mining algorithm PARM with fast support calculation and significant pruning techniques is introduced to improve the efficiency of the rule mining process. The P-tree based association rule mining (PARM) algorithm is implemented and compared with FP-growth and Apriori algorithms. Experimental results showed that our algorithm is superior for association rule mining on RSI spatial data.

[1]  Hanan Samet,et al.  Applications of spatial data structures , 1989 .

[2]  H. Kriegel,et al.  Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support , 2000, Data Mining and Knowledge Discovery.

[3]  Jiawei Han,et al.  Spatial Data Mining: Progress and Challenges , 1996, Workshop on Research Issues on Data Mining and Knowledge Discovery.

[4]  Yehuda Salu,et al.  Classification of multispectral image data by the binary diamond neural network and by nonparametric, pixel-by-pixel methods , 1993, IEEE Trans. Geosci. Remote. Sens..

[5]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[6]  Hans-Peter Kriegel,et al.  Spatial Data Mining: A Database Approach , 1997, SSD.

[7]  S. Golomb Run-length encodings. , 1966 .

[8]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[10]  John B. Rundle,et al.  Reduction And Predictability Of Natural Disasters , 1996 .

[11]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[12]  Jiawei Han,et al.  Mining recurrent items in multimedia with progressive resolution refinement , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[14]  B. S. Manjunath,et al.  Mining Image Datasets Using Perceptual Association Rules , 2003 .

[15]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[16]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[17]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[18]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[20]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[21]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[22]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[23]  Carlos Ordonez,et al.  Discovering association rules based on image content , 1999, Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries.

[24]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[25]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[26]  Yasuhiko Morimoto,et al.  Mining optimized association rules for numeric attributes , 1996, J. Comput. Syst. Sci..

[27]  J. Townshend,et al.  Global land cover classi(cid:142) cation at 1 km spatial resolution using a classi(cid:142) cation tree approach , 2004 .

[28]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[29]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[30]  Yasuhiko Morimoto,et al.  Mining Optimized Association Rules for Numeric Attributes , 1999, J. Comput. Syst. Sci..

[31]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[32]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[33]  Qiang Ding,et al.  Decision tree classification of spatial data streams using Peano Count Trees , 2002, SAC '02.

[34]  AgrawalRakesh,et al.  Mining quantitative association rules in large relational tables , 1996 .

[35]  Andrew W. Moore,et al.  Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[36]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[37]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[38]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[39]  Qin Ding,et al.  k-nearest Neighbor Classification on Spatial Data Streams Using P-trees , 2002, PAKDD.

[40]  Qiang Ding,et al.  Deriving High Confidence Rules from Spatial Data Using Peano Count Trees , 2001, WAIM.

[41]  Qiang Ding,et al.  Association Rule Mining on Remotely Sensed Images Using P-trees , 2002, PAKDD.

[42]  S YuPhilip,et al.  An effective hash-based algorithm for mining association rules , 1995 .

[43]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.