Analysing Effect of Database Grouping on Multi-Database Mining

In many applications we need to synthesize global patterns in multiple large databases, where the applications are independent of the characteristics of local patterns. Pipelined feedback technique (PFT) seems to be the most effective technique under the approach of local pattern analysis (LPA). The goal of this paper is to analyse the effect of database grouping on multi-database mining. For this purpose we design a database grouping algorithm. We introduce an approach of non-local pattern analysis (NLPA) by combining database grouping algorithm and pipelined feedback technique for multi-database mining. We propose to judge the effectiveness of non-local pattern analysis for multi-database mining. We conduct experiments on both real and synthetic databases. Experimental results show that the approach to non-local pattern analysis does not always improve the accuracy of mining global patterns in multiple databases. Index Terms — Local pattern analysis, Multi-database mining, Non-local pattern analysis, Pipelined feedback technique, Synthesis of patterns

[1]  Xindong Wu,et al.  Synthesizing High-Frequency Rules from Different Data Sources , 2003, IEEE Trans. Knowl. Data Eng..

[2]  David Page,et al.  Biological applications of multi-relational data mining , 2003, SKDD.

[3]  Xindong Wu,et al.  Knowledge Discovery in Multiple Databases , 2004, ICTAI.

[4]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[5]  Witold Pedrycz,et al.  Developing Multi-Database Mining Applications , 2010, Advanced Information and Knowledge Processing.

[6]  Jhimli Adhikari,et al.  Clustering items in different data sources induced by stability , 2009, Int. Arab J. Inf. Technol..

[7]  Jhimli Adhikari,et al.  Mining Multiple Large Data Sources , 2010, Int. Arab J. Inf. Technol..

[8]  Animesh Adhikari,et al.  Efficient clustering of databases induced by local patterns , 2008, Decis. Support Syst..

[9]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[10]  Animesh Adhikari,et al.  Synthesizing heavy association rules from different real data sources , 2008, Pattern Recognit. Lett..

[11]  Moustafa Ghanem,et al.  Large Scale Data Mining: Challenges and Responses , 1997, KDD.

[12]  Shichao Zhang,et al.  Mining Multiple Data Sources: Local Pattern Analysis , 2006, Data Mining and Knowledge Discovery.

[13]  Wen-Chih Peng,et al.  Mining sequential patterns across multiple sequence databases , 2009, Data Knowl. Eng..

[14]  James H Harrison,et al.  Multi-database mining. , 2008, Clinics in laboratory medicine.

[15]  Chengqi Zhang,et al.  Identifying Global Exceptional Patterns in Multi-database Mining , 2004, IEEE Intell. Informatics Bull..

[16]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  M. Narasimha Murty,et al.  Partition based pattern synthesis technique with efficient algorithms for nearest neighbor classification , 2006, Pattern Recognit. Lett..

[19]  Xindong Wu,et al.  Database classification for multi-database mining , 2005, Inf. Syst..