Review on mining data from multiple data sources

Abstract In this paper, we review recent progresses in the area of mining data from multiple data sources. The advancement of information communication technology has generated a large amount of data from different sources, which may be stored in different geological locations. Mining data from multiple data sources to extract useful information is considered to be a very challenging task in the field of data mining, especially in the current big data era. The methods of mining multiple data sources can be divided mainly into four groups: (i) pattern analysis, (ii) multiple data source classification, (iii) multiple data source clustering, and (iv) multiple data source fusion. The main purpose of this review is to systematically explore the ideas behind current multiple data source mining methods and to consolidate recent research results in this field.

[1]  Yunliang Chen,et al.  Mining association rules in big data with NGEP , 2014, Cluster Computing.

[2]  Xindong Wu,et al.  Synthesizing High-Frequency Rules from Different Data Sources , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Zi Huang,et al.  Sparse hashing for fast multimedia search , 2013, TOIS.

[4]  D. Timmerman,et al.  Automated classification of static ultrasound images of ovarian tumours based on decision level fusion , 2014, 2014 6th Computer Science and Electronic Engineering Conference (CEEC).

[5]  Maja Pantic,et al.  Meta-Analysis of the First Facial Expression Recognition Challenge , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Animesh Adhikari,et al.  Synthesizing heavy association rules from different real data sources , 2008, Pattern Recognit. Lett..

[7]  Maja Pantic,et al.  Decision Level Fusion of Domain Specific Regions for Facial Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[8]  Roque Marín,et al.  ClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences , 2013, PAKDD.

[9]  Ren C. Luo,et al.  Multisensor integration and fusion in intelligent systems , 1989, IEEE Trans. Syst. Man Cybern..

[10]  Xiaofeng Zhu,et al.  Efficient kNN Classification With Different Numbers of Nearest Neighbors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Mohammad H. Mahoor,et al.  Automatic detection of non-posed facial action units , 2012, 2012 19th IEEE International Conference on Image Processing.

[12]  Xindong Wu,et al.  Database classification for multi-database mining , 2005, Inf. Syst..

[13]  Rengaramanujam Srinivasan,et al.  Modified algorithms for synthesizing high-frequency rules from different data sources , 2008, Knowledge and Information Systems.

[14]  Ashok Kumar Das,et al.  An efficient approach for mining association rules from high utility itemsets , 2015, Expert Syst. Appl..

[15]  Xiaofeng Zhu,et al.  Missing data imputation by utilizing information within incomplete instances , 2011, J. Syst. Softw..

[16]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[17]  Shichao Zhang,et al.  Mining Multiple Data Sources: Local Pattern Analysis , 2006, Data Mining and Knowledge Discovery.

[18]  Shiting Wen,et al.  Multi-source adaptation learning with global and local regularization by exploiting joint kernel sparse representation , 2016, Knowl. Based Syst..

[19]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[20]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[21]  Jhimli Adhikari,et al.  Mining Multiple Large Databases , 2007 .

[22]  Xuelong Li,et al.  Learning k for kNN Classification , 2017, ACM Trans. Intell. Syst. Technol..

[23]  Mustafa Mat Deris,et al.  Mining significant association rules from educational data using critical relative support approach , 2011 .

[24]  Chengqi Zhang,et al.  Data preparation for data mining , 2003, Appl. Artif. Intell..

[25]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[26]  Lakhmi C. Jain,et al.  Analysing Effect of Database Grouping on Multi-Database Mining , 2011, IEEE Intell. Informatics Bull..

[27]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[28]  Shutao Li,et al.  Multifocus Image Fusion and Restoration With Sparse Representation , 2010, IEEE Transactions on Instrumentation and Measurement.

[29]  Zi Huang,et al.  A Sparse Embedding and Least Variance Encoding Approach to Hashing , 2014, IEEE Transactions on Image Processing.

[30]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[31]  Shichao Zhang,et al.  "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[32]  Yan Liu,et al.  A new method of feature fusion and its application in image recognition , 2005, Pattern Recognit..

[33]  Jhimli Adhikari,et al.  Mining Multiple Large Data Sources , 2010, Int. Arab J. Inf. Technol..

[34]  James Llinas,et al.  An introduction to multi-sensor data fusion , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[35]  Dinggang Shen,et al.  Subspace Regularized Sparse Multitask Learning for Multiclass Neurodegenerative Disease Identification , 2016, IEEE Transactions on Biomedical Engineering.

[36]  Chengqi Zhang,et al.  Toward databases mining: Pre-processing collected data , 2003, Appl. Artif. Intell..

[37]  Wei Wang,et al.  Sequential Pattern Mining in Multi-Databases via Multiple Alignment , 2006, Data Mining and Knowledge Discovery.

[38]  R. Suganthi,et al.  Exceptional Patterns with Clustering Items in Multiple Databases , 2015 .

[39]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Rafael García,et al.  Fusion of multispectral and panchromatic images using improved IHS and PCA mergers based on wavelet decomposition , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[41]  Saurabh Prasad,et al.  Locality Preserving Composite Kernel Feature Extraction for Multi-Source Geospatial Image Analysis , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[42]  Xindong Wu,et al.  Mining globally interesting patterns from multiple databases using kernel estimation , 2009, Expert Syst. Appl..

[43]  Hong Li,et al.  An Improved Database Classification Algorithm for Multi-database Mining , 2009, FAW.

[44]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[45]  Ali R. Hurson,et al.  A taxonomy and current issues in multidatabase systems , 1992, Computer.

[46]  Xiaofeng Zhu,et al.  Graph self-representation method for unsupervised feature selection , 2017, Neurocomputing.

[47]  Gu Xi A New Cross-multidomain Classification Algorithm and Its Fast Version for Large Datasets , 2014 .

[48]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[49]  M. Buehler,et al.  Feature-level fusion for free-form object tracking using laserscanner and video , 2005, IEEE Proceedings. Intelligent Vehicles Symposium, 2005..

[50]  Qiang Yang,et al.  Mining Adaptive Ratio Rules from Distributed Data Sources , 2006, Data Mining and Knowledge Discovery.

[51]  J. Bather,et al.  Tracking and data fusion , 2001 .

[52]  Chengqi Zhang,et al.  Identifying Global Exceptional Patterns in Multi-database Mining , 2004, IEEE Intell. Informatics Bull..

[53]  Xindong Wu,et al.  Data Mining and Multi-database Mining , 2004 .

[54]  Ren C. Luo,et al.  Dynamic multi-sensor data fusion system for intelligent robots , 1988, IEEE J. Robotics Autom..

[55]  Yue Liu,et al.  Efficient kNN Algorithm Based on Graph Sparse Reconstruction , 2014, ADMA.

[56]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[57]  Alin Achim,et al.  Sparse Bayesian Learning for non-Gaussian sources , 2015, Digit. Signal Process..

[58]  Zhang Mei,et al.  A Simple Methodology for Database Clustering , 2015 .

[59]  Debi Prosad Dogra,et al.  Coupled HMM-based multi-sensor data fusion for sign language recognition , 2017, Pattern Recognit. Lett..

[60]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[61]  V. Milutinovic,et al.  A survey of military applications of wireless sensor networks , 2012, 2012 Mediterranean Conference on Embedded Computing (MECO).

[62]  Xiaofeng Zhu,et al.  Multi-view multi-sparsity kernel reconstruction for multi-class image classification , 2015, Neurocomputing.

[63]  Qiang Yang,et al.  Acquiring knowledge from inconsistent data sources through weighting , 2010, Data Knowl. Eng..

[64]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[65]  Xindong Wu,et al.  Multi-Database Mining , 2003, IEEE Intell. Informatics Bull..

[66]  Khiat Salim,et al.  Probabilistic Models for Local Patterns Analysis , 2014 .

[67]  Amrane Houacine,et al.  Redundant versus orthogonal wavelet decomposition for multisensor image fusion , 2003, Pattern Recognit..

[68]  Rong Wang,et al.  A Feature-Level Image Fusion Algorithm Based on Neural Networks , 2007, 2007 1st International Conference on Bioinformatics and Biomedical Engineering.

[69]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[70]  Andrea Garzelli,et al.  Context-driven fusion of high spatial and spectral resolution images based on oversampled multiresolution analysis , 2002, IEEE Trans. Geosci. Remote. Sens..

[71]  Jiadong Ren,et al.  Mining sequential patterns with periodic wildcard gaps , 2014, Applied Intelligence.

[72]  Manuel Campos,et al.  Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information , 2014, PAKDD.