A survey on mining multiple data sources

Advancements in computer and communication technologies demand new perceptions of distributed computing environments and development of distributed data sources for storing voluminous amount of data. In such circumstances, mining multiple data sources for extracting useful patterns of significance is being considered as a challenging task within the data mining community. The domain, multi‐database mining (MDM) is regarded as a promising research area as evidenced by numerous research attempts in the recent past. The methods exist for discovering knowledge from multiple data sources, they fall into two wide categories, namely (1) mono‐database mining and (2) local pattern analysis. The main intent of the survey is to explain the idea behind those approaches and consolidate the research contributions along with their significance and limitations.

[1]  Xindong Wu,et al.  Database classification for multi-database mining , 2005, Inf. Syst..

[2]  Rengaramanujam Srinivasan,et al.  Modified algorithms for synthesizing high-frequency rules from different data sources , 2008, Knowledge and Information Systems.

[3]  Hongjun Lu,et al.  Toward Multidatabase Mining: Identifying Relevant Databases , 2001, IEEE Trans. Knowl. Data Eng..

[4]  Xindong Wu,et al.  Mining globally interesting patterns from multiple databases using kernel estimation , 2009, Expert Syst. Appl..

[5]  Xindong Wu,et al.  Knowledge Discovery in Multiple Databases , 2004, ICTAI.

[6]  Philip K. Chan,et al.  Meta-learning in distributed data mining systems: Issues and approaches , 2007 .

[7]  Jhimli Adhikari,et al.  Clustering items in different data sources induced by stability , 2009, Int. Arab J. Inf. Technol..

[8]  Qiang Yang,et al.  Acquiring knowledge from inconsistent data sources through weighting , 2010, Data Knowl. Eng..

[9]  Yiyu Yao,et al.  Peculiarity Oriented Multi-database Mining , 1999, PKDD.

[10]  Xindong Wu,et al.  Synthesizing High-Frequency Rules from Different Data Sources , 2003, IEEE Trans. Knowl. Data Eng..

[11]  Wei Wang,et al.  Sequential Pattern Mining in Multi-Databases via Multiple Alignment , 2006, Data Mining and Knowledge Discovery.

[12]  Yiyu Yao,et al.  Peculiarity Oriented Multidatabase Mining , 2003, IEEE Trans. Knowl. Data Eng..

[13]  Witold Pedrycz,et al.  Study of select items in different data sources by grouping , 2010, Knowledge and Information Systems.

[14]  R. Nedunchezhian,et al.  Post Mining- Discovering Valid Rules from Different Sized Data Sources , 2007 .

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  Larry Kerschberg,et al.  Mining for knowledge in databases: The INLEN architecture, initial implementation and first results , 2004, Journal of Intelligent Information Systems.

[17]  Patrick Meyer,et al.  On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..

[18]  Robert L. Grossman,et al.  A Framework for Finding Distributed Data Mining Strategies That are Intermediate Between Centralized , 2000 .

[19]  Hongjun Lu,et al.  Identifying Relevant Databases for Multidatabase Mining , 1998, PAKDD.

[20]  Xindong Wu,et al.  Rule Synthesizing from Multiple Related Databases , 2010, PAKDD.

[21]  Xindong Wu,et al.  Fundamentals of association rules in data mining and knowledge discovery , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[22]  Bruce G. Buchanan,et al.  The WoRLD: Knowledge Discovery from Multiple Distributed Databases , 2007 .

[23]  Lakhmi C. Jain,et al.  Analysing Effect of Database Grouping on Multi-Database Mining , 2011, IEEE Intell. Informatics Bull..

[24]  Hillol Kargupta,et al.  Collective Principal Component Analysis from Distributed, Heterogeneous Data , 2000, PKDD.

[25]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[26]  Chengqi Zhang,et al.  Identifying Global Exceptional Patterns in Multi-database Mining , 2004, IEEE Intell. Informatics Bull..

[27]  Shichao Zhang,et al.  Mining Multiple Data Sources: Local Pattern Analysis , 2006, Data Mining and Knowledge Discovery.

[28]  James H Harrison,et al.  Multi-database mining. , 2008, Clinics in laboratory medicine.

[29]  Rengaramanujam Srinivasan,et al.  Multi-Level Synthesis of Frequent Rules from Different Data-Sources , 2010 .

[30]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[31]  Ali R. Hurson,et al.  A taxonomy and current issues in multidatabase systems , 1992, Computer.

[32]  Animesh Adhikari,et al.  Synthesizing heavy association rules from different real data sources , 2008, Pattern Recognit. Lett..

[33]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[34]  Robert L. Grossman,et al.  The Preliminary Design of Papyrus: A System for High Performance Distributed Data Mining over Cluste , 1998, AAAI 1998.

[35]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[36]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.