Mining Correlated Pairs of Patterns in Multidimensional Structured Databases

Structured data is becoming increasingly abundant in many application domains recently. In this paper, as one of the correlation mining, we propose new data mining problems of finding frequent and correlated pairs of patterns in structured databases. First, we consider the problem of finding all frequent and correlated pattern pairs in two dimensional structured databases. Then, two kinds of top-k mining problems are studied. To solve these problems efficiently, we develop a series of algorithms having powerful pruning capabilities. We also discuss the applicability of the proposed algorithms to the discovery of pattern pairs in single and multidimensional structured databases. The effectiveness of proposed algorithms is assessed through the experiments with synthetic and real world datasets.

[1]  Wilfred Ng,et al.  Correlation search in graph databases , 2007, KDD '07.

[2]  David Avis,et al.  Reverse Search for Enumeration , 1996, Discret. Appl. Math..

[3]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[4]  Ashwin Srinivasan,et al.  Mutagenesis: ILP experiments in a non-determinate biological domain , 1994 .

[5]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[6]  Wilfred Ng,et al.  Mining quantitative correlated patterns using an information-theoretic approach , 2006, KDD '06.

[7]  Ashwin Srinivasan,et al.  The Predictive Toxicology Challenge 2000-2001 , 2001, Bioinform..

[8]  Philip S. Yu IEEE Transactions on Knowledge and Data Engineering: EIC Editorial , 2001 .

[9]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[10]  Hui Xiong,et al.  TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  Hui Xiong,et al.  TAPER: a two-step approach for all-strong-pairs correlation query in large databases , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[13]  Panida Songram,et al.  Closed Multidimensional Sequential Pattern Mining , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[14]  Tomonobu Ozaki,et al.  Mining Correlated Subgraphs in Graph Databases , 2008, PAKDD.

[15]  Hui Xiong,et al.  Hyperclique pattern discovery , 2006, Data Mining and Knowledge Discovery.

[16]  Gemma C. Garriga,et al.  Cross-Mining Binary and Numerical Attributes , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[17]  Yen-Liang Chen,et al.  Mining sequential patterns from multidimensional sequence data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.