A Review of Data Mining Techniques and Applications

Data mining is the analytics and knowledge discovery process of analyzing large volumes of data from various sources and transforming the data into useful information. Various disciplines have contributed to its development and is becoming increasingly important in the scientific and industrial world. This article presents a review of data mining techniques and applications from 1996 to 2016. Techniques are divided into two main categories: predictive methods and descriptive methods. Due to the huge number of publications available on this topic, only a selected number are used in this review to highlight the developments of the past 20 years. Applications are included to provide some insights into how each data mining technique has evolved over the last two decades. Recent research trends focus more on large data sets and big data. Recently there have also been more applications in area of health informatics with the advent of newer algorithms.

[1]  Oliver Linton,et al.  Testing additivity in generalized nonparametric regression models with estimated parameters , 2001 .

[2]  Guo Haiyan,et al.  The E-commerce Risk Early-Warning Model Based on the Unascertained C-means Clustering , 2011 .

[3]  Tzung-Pei Hong,et al.  A fuzzy AprioriTid mining algorithm with reduced computational time , 2004, Appl. Soft Comput..

[4]  Ahmed F. Ghoniem,et al.  K-means clustering for optimal partitioning and dynamic load balancing of parallel hierarchical N-body simulations , 2005 .

[5]  K. Doi,et al.  Computerized detection of lung nodules in thin-section CT images by use of selective enhancement filters and an automated rule-based classifier. , 2008, Academic Radiology.

[6]  Wilfrido Moreno,et al.  Introduction to Artificial Neural Network (ANN) as a Predictive Tool for Drug Design, Discovery, Delivery, and Disposition , 2016 .

[7]  Alex Alves Freitas,et al.  Interfacing knowledge discovery algorithms to large database management systems , 1999, Inf. Softw. Technol..

[8]  Kotaro Hirasawa,et al.  Support Vector Machine Classifier with WHM Offset for Unbalanced Data , 2008, J. Adv. Comput. Intell. Intell. Informatics.

[9]  Wen Yu,et al.  Randomized algorithms for nonlinear system identification with deep learning modification , 2016, Inf. Sci..

[10]  Vincent Frouin,et al.  Robust regression for large-scale neuroimaging studies , 2015, NeuroImage.

[11]  Appa Rao Allam,et al.  A computational intelligence technique for the effective diagnosis of diabetic patients using principal component analysis (PCA) and modified fuzzy SLIQ decision tree approach , 2016 .

[12]  K. Weigel,et al.  Development of International Conversion Equations Using Robust Regression Methodology , 1999 .

[13]  Shingo Mabu,et al.  A Class Association Rule Based Classifier Using Probability Density Functions for Intrusion Detection Systems , 2015, J. Adv. Comput. Intell. Intell. Informatics.

[14]  He Jiang,et al.  Application of multidimensional association rules in personal financial services , 2010, 2010 International Conference On Computer Design and Applications.

[15]  Amitava Chatterjee,et al.  Hybrid multiresolution Slantlet transform and fuzzy c-means clustering approach for normal-pathological brain MR image segregation. , 2008, Medical engineering & physics.

[16]  Kenneth J. Mackin,et al.  Emergence of Learning Rule in Neural Networks Using Genetic Programming Combined with Decision Trees , 1999, J. Adv. Comput. Intell. Intell. Informatics.

[17]  Li-Hong Juang,et al.  MRI brain lesion image detection based on color-converted K-means clustering segmentation , 2010 .

[18]  Hyung Lee-Kwang,et al.  Type-2 Fuzzy Hypergraphs Using Type-2 Fuzzy Sets , 2000, Journal of Advanced Computational Intelligence and Intelligent Informatics.

[19]  Theodore R. Holford,et al.  Association of caffeine metabolites in umbilical cord blood with IUGR and preterm delivery: A prospective cohort study of 1609 pregnancies , 2005 .

[20]  Ming-Yang Su,et al.  Real-time anomaly detection systems for Denial-of-Service attacks by weighted k-nearest-neighbor classifiers , 2011, Expert Syst. Appl..

[21]  Prerna Mahajan,et al.  Rough Set Approach in Machine Learning: A Review , 2012 .

[22]  Nezamoddin N. Kachouie,et al.  Nonparametric Regression for Estimation of Spatiotemporal Mountain Glacier Retreat From Satellite Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Mario Piattini,et al.  Access control and audit model for the multidimensional modeling of data warehouses , 2006, Decis. Support Syst..

[24]  Noboru Takagi An Application of Binary Decision Trees to Pattern Recognition , 2006, J. Adv. Comput. Intell. Intell. Informatics.

[25]  Chalavadi Krishna Mohan,et al.  Human action recognition using genetic algorithms and convolutional neural networks , 2016, Pattern Recognit..

[26]  Florin Gorunescu,et al.  Data Mining - Concepts, Models and Techniques , 2011, Intelligent Systems Reference Library.

[27]  Md. Tarek Habib,et al.  Fabric defect classification with geometric features using Bayesian classifier , 2015, 2015 International Conference on Advances in Electrical Engineering (ICAEE).

[28]  Kyoungok Kim,et al.  A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree , 2016, Pattern Recognit..

[29]  Adam Wright,et al.  The use of sequential pattern mining to predict next prescribed medications , 2015, J. Biomed. Informatics.

[30]  Youlin Shang,et al.  Semi-supervised outlier detection based on fuzzy rough C-means clustering , 2010, Math. Comput. Simul..

[31]  Monika Hanesch,et al.  The application of fuzzy C-means cluster analysis and non-linear mapping to a soil data set for the detection of polluted sites , 2001 .

[32]  Khaled Nouri,et al.  Neural Network-Based Speed Control of A Two-Mass-Model System , 1999, J. Adv. Comput. Intell. Intell. Informatics.

[33]  Josef Kittler,et al.  Bayesian and neural networks for geographic information processing , 1996, Pattern Recognit. Lett..

[34]  Sandeep Paul,et al.  A review on advances in deep learning , 2015, 2015 IEEE Workshop on Computational Intelligence: Theories, Applications and Future Directions (WCI).

[35]  Orhan Kesemen,et al.  Fuzzy c-means clustering algorithm for directional data (FCM4DD) , 2016, Expert Syst. Appl..

[36]  Pablo Valenti,et al.  Automatic detection of interictal spikes using data mining models , 2006, Journal of Neuroscience Methods.

[37]  Michael Godfrey,et al.  Efficacy of end-user neural network and data mining software for predicting complex system performance , 2003 .

[38]  Chien-Cheng Lee,et al.  Classification of Liver Disease from CT Images Using a Support Vector Machine , 2007, J. Adv. Comput. Intell. Intell. Informatics.

[39]  Junzo Watada,et al.  Rough Sets Based Prediction Model of Tick-Wise Price Fluctuations , 2011, J. Adv. Comput. Intell. Intell. Informatics.

[40]  Shingo Mabu,et al.  Ensemble learning of rule-based evolutionary algorithm using multi-layer perceptron for supporting decisions in stock trading problems , 2015, Appl. Soft Comput..

[41]  Urszula Boryczka,et al.  Collective data mining in the ant colony decision tree approach , 2016, Inf. Sci..

[42]  Sreejit Chakravarty,et al.  Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system , 2016, Swarm Evol. Comput..

[43]  Lawrence O. Hall,et al.  Generation of Fuzzy Rules from Decision Trees , 1998, J. Adv. Comput. Intell. Intell. Informatics.

[44]  Kaoru Hirota,et al.  Multi-Level Control of Fuzzy-Constraint Propagation via Evaluations with Linguistic Truth Values in Generalized-Mean-Based Inference , 2016, J. Adv. Comput. Intell. Intell. Informatics.

[45]  Kyung Mi Lee,et al.  Supervised Learning-Based Feature Selection for Mondrian Paintings Style Authentication , 2012, J. Adv. Comput. Intell. Intell. Informatics.

[46]  Dominique Douguet,et al.  Diacylglyceride kinases, sphingosine kinases and NAD kinases: distant relatives of 6-phosphofructokinases. , 2002, Trends in biochemical sciences.

[47]  A nonparametric approach to test for predictability , 2016 .

[48]  Qinghua Hu,et al.  Combining heterogeneous deep neural networks with conditional random fields for Chinese dialogue act recognition , 2015, Neurocomputing.

[49]  Elmer A. Maravillas,et al.  Weather Forecasting Using Artificial Neural Network and Bayesian Network , 2014, J. Adv. Comput. Intell. Intell. Informatics.

[50]  Dong Wang,et al.  K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited , 2016 .

[51]  James Nga-Kwok Liu,et al.  Inter-transactional association rules for multi-dimensional contexts for prediction and their application to studying meteorological data , 2001, Data Knowl. Eng..

[52]  Roberto Buizza,et al.  Density Forecasting for Weather Derivative Pricing , 2006 .

[53]  Kauko Leiviskä,et al.  Case-Based Reasoning in Web Break Sensitivity Evaluation in a Paper Machine , 2005, J. Adv. Comput. Intell. Intell. Informatics.

[54]  Holzworth Policy Capturing with Ridge Regression , 1996, Organizational behavior and human decision processes.

[55]  S. Efromovich Nonparametric regression with responses missing at random , 2011 .

[56]  Sven F. Crone,et al.  The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing , 2006, Eur. J. Oper. Res..

[57]  Ali M. Abdulshahed,et al.  Thermal error modelling of machine tools based on ANFIS with fuzzy c-means clustering using a thermal imaging camera , 2015 .

[58]  Jun Lu,et al.  A Powerful Neural Network Method with Digital-contract Hints for Pricing Complex Options , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[59]  R. Abdel-Aal,et al.  Modeling and forecasting monthly patient volume at a primary health care clinic using univariate time-series analysis. , 1998, Computer methods and programs in biomedicine.

[60]  Apostolos Serletis,et al.  Empirical evidence on the long-run neutrality hypothesis using low-frequency international data , 1996 .

[61]  Been-Chian Chien,et al.  Mining Fuzzy Association Rules on Has-A and Is-A Hierarchical Structures , 2007, J. Adv. Comput. Intell. Intell. Informatics.

[62]  Li Zhang,et al.  Multidimensional Association Analysis of Web Users' Access Path for Website Design and Promotion , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[63]  Siu-Ming Yiu,et al.  An efficient algorithm for finding dense regions for mining quantitative association rules , 2005 .

[64]  Hiroshi Sakai,et al.  Rough Sets Based Rule Generation from Data with Categorical and Numerical Values , 2008, J. Adv. Comput. Intell. Intell. Informatics.

[65]  Jinglu Hu,et al.  Human Resource Selection Based on Performance Classification Using Weighted Support Vector Machine , 2009, J. Adv. Comput. Intell. Intell. Informatics.

[66]  M. C. Ortiz,et al.  Robust regression techniques A useful alternative for the detection of outlier data in chemical analysis. , 2006, Talanta.

[67]  J. Kazmierska,et al.  Application of the Naïve Bayesian Classifier to optimize treatment decisions. , 2008, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[68]  Shan-Tai Chen,et al.  A Data Mining Approach to Rainfall Intensity Classification Using TRMM/TMI Data , 2008, J. Adv. Comput. Intell. Intell. Informatics.

[69]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[70]  Kazutaka Shimada,et al.  A Combined Method Based on SVM and Online Learning with HOG for Hand Shape Recognition , 2012, J. Adv. Comput. Intell. Intell. Informatics.

[71]  E. D’Agaro,et al.  Use of nonlinear regression to evaluate the effects of phytase enzyme treatment of plant protein diets for rainbow trout (Oncorhynchus mykiss) , 1998 .

[72]  Koichi Yamada,et al.  Rough Set Approach with Imperfect Data Based on Dempster-Shafer Theory , 2014, J. Adv. Comput. Intell. Intell. Informatics.

[73]  A. K. Misra,et al.  Estimating impact of puddling, tillage and residue management on wheat (Triticum aestivum, L.) seedling emergence and growth in a rice–wheat system using nonlinear regression models , 2006 .

[74]  Takvor H. Soukissian,et al.  On the use of robust regression methods in wind speed assessment , 2016 .

[75]  Li Hanguang,et al.  Intrusion Detection Technology Research Based on Apriori Algorithm , 2012 .

[76]  Francisco J. Batlles,et al.  The application of Bayesian network classifiers to cloud classification in satellite images , 2016 .

[77]  Manabu Nii,et al.  Fuzzy Nonlinear Regression Analysis Using Fuzzified Neural Networks for Fault Diagnosis of Chemical Plants , 2011, Journal of Advanced Computational Intelligence and Intelligent Informatics.

[78]  Nick Cercone,et al.  Integrating rough set theory and medical applications , 2008, Appl. Math. Lett..

[79]  László T. Kóczy,et al.  Improved Fuzzy and Neural Network Algorithms for Word Frequency Prediction in Document Filtering , 1998, J. Adv. Comput. Intell. Intell. Informatics.

[80]  Kaoru Hirota,et al.  Chain Restaurant Work Scheduling Based on Genetic Algorithm with Fuzzy Logic , 2006, Journal of Advanced Computational Intelligence and Intelligent Informatics.

[81]  Hongyang Zhang,et al.  Robust regression estimation and inference in the presence of cellwise and casewise contamination , 2015, Comput. Stat. Data Anal..

[82]  Kaoru Hirota,et al.  An Application of Fuzzy Theory to the Case-Based Reasoning of the CISG , 1997, J. Adv. Comput. Intell. Intell. Informatics.

[83]  Jiebo Luo,et al.  Image segmentation via adaptive K-mean clustering and knowledge-based morphological operations with biomedical applications , 1998, IEEE Trans. Image Process..

[84]  Surajit Chattopadhyay,et al.  Univariate modelling of summer-monsoon rainfall time series: Comparison between ARIMA and ARNN , 2010 .

[85]  Shie-Yui Liong,et al.  Forecasting of hydrologic time series with ridge regression in feature space , 2007 .

[86]  Jian Huang,et al.  A comparison of calibration methods based on calibration data size and robustness , 2002 .

[87]  Kotaro Hirasawa,et al.  Alternate Genetic Network Programming with Association Rules Acquisition Mechanisms Between Attribute Families , 2006, J. Adv. Comput. Intell. Intell. Informatics.

[88]  Xi Chen,et al.  Sensitivity analysis and determination of streambed leakance and aquifer hydraulic properties , 2003 .

[89]  Kenji Suzuki,et al.  A Genetic-Algorithm-Based Temporal Subtraction for Chest Radiographs , 2009, J. Adv. Comput. Intell. Intell. Informatics.

[90]  Sushma S. Kulkarni,et al.  Modeling compressive strength of recycled aggregate concrete by Artificial Neural Network, Model Tree and Non-linear Regression , 2014 .

[91]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[92]  Toru Tamaki,et al.  Human Limb Extraction Based on Motion Estimation Using Optical Flow and Image Registration , 2004, J. Adv. Comput. Intell. Intell. Informatics.

[93]  Andreas Rauber,et al.  Cluster Analysis as a First Step in the Knowledge Discovery Process , 2000, J. Adv. Comput. Intell. Intell. Informatics.

[94]  E. G. Sarabia,et al.  Fuzzy c-means clustering for noise reduction, enhancement and reconstruction of 3D ultrasonic images , 1999, 1999 7th IEEE International Conference on Emerging Technologies and Factory Automation. Proceedings ETFA '99 (Cat. No.99TH8467).

[95]  Mitica Craus,et al.  Grid implementation of the Apriori algorithm , 2007, Adv. Eng. Softw..

[96]  José Cristóbal Riquelme Santos,et al.  Inferring gene-gene associations from Quantitative Association Rules , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[97]  M. Obersteiner,et al.  Forecasting electricity spot-prices using linear univariate time-series models , 2004 .

[98]  Chalavadi Krishna Mohan,et al.  Hybrid deep neural network model for human action recognition , 2016, Appl. Soft Comput..

[99]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[100]  Yuki Suga,et al.  Multimodal integration learning of robot behavior using deep neural networks , 2014, Robotics Auton. Syst..

[101]  Adem Karahoca,et al.  Survey of Data Mining and Applications (Review from 1996 to Now) , 2012 .

[102]  Ario Ohsato,et al.  Developing Case-based Reasoning System for Medical Consultation Using the Importance of Features , 2002, J. Adv. Comput. Intell. Intell. Informatics.

[103]  Hajime Nobuhara,et al.  Advanced Genetic Algorithms Based on Adaptive Partitioning Method , 2007, J. Adv. Comput. Intell. Intell. Informatics.

[104]  Yingchun Zhang,et al.  Surface EMG Decomposition Based on K-means Clustering and Convolution Kernel Compensation , 2015, IEEE Journal of Biomedical and Health Informatics.

[105]  Yan Liu,et al.  Research and application of association rule mining algorithm based on multidimensional sets , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[106]  Xingquan Zhu,et al.  Quantitative Association Rules , 2009, Encyclopedia of Database Systems.

[107]  Tomohiro Takagi,et al.  Dynamic Sense Representation Using Conceptual Fuzzy Sets , 2006, Journal of Advanced Computational Intelligence and Intelligent Informatics.

[108]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[109]  Takashi Miyajima,et al.  Placement Time Optimization of Chip Mounter by Genetic Algorithms - Search for Optimal Tape Feeder Arrangement - , 1998, J. Adv. Comput. Intell. Intell. Informatics.

[110]  Noriaki Muranaka,et al.  Product-Impression Analysis Using Fuzzy C4.5 Decision Tree , 2009, J. Adv. Comput. Intell. Intell. Informatics.

[111]  Kidong Lee,et al.  Robust regression-based analysis of drug-nucleic acid binding. , 2003, Analytical biochemistry.

[112]  H. Huizenga,et al.  Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyze univariate data. , 2015, Behavior therapy.

[113]  Tadashi Kondo,et al.  Medical Image Diagnosis of Liver Cancer Using a Neural Network and Artificial Intelligence , 2011, J. Adv. Comput. Intell. Intell. Informatics.

[114]  A. Western,et al.  Multivariate time series modeling of short-term system scale irrigation demand , 2015 .

[115]  Xueping Li,et al.  A comparative analysis of predictive data mining techniques , 2009 .

[116]  Catherine Garbay,et al.  Learning recurrent behaviors from heterogeneous multivariate time-series , 2007, Artif. Intell. Medicine.

[117]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[118]  Wang Renli,et al.  The application of Apriori-BSO algorithms in medical records data mining , 2016, 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference.

[119]  Phayung Meesad,et al.  Stock Market Trend Prediction Based on Text Mining of Corporate Web and Time Series Data , 2014, J. Adv. Comput. Intell. Intell. Informatics.

[120]  Klaus Lehnertz,et al.  A distributed computing system for multivariate time series analyses of multichannel neurophysiological data , 2006, Journal of Neuroscience Methods.

[121]  Eunshin Byon,et al.  Condition Monitoring of Wind Power System With Nonparametric Regression Analysis , 2014, IEEE Transactions on Energy Conversion.

[122]  Gyun Young Heo CONDITION MONITORING USING EMPIRICAL MODELS: TECHNICAL REVIEW AND PROSPECTS FOR NUCLEAR APPLICATIONS , 2008 .

[123]  M. Kantardzic,et al.  A data-mining approach to improving polycythemia vera diagnosis , 2002 .

[124]  R. Jha,et al.  Anomaly detection in network traffic using K-mean clustering , 2016, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT).

[125]  Avinash Keskar,et al.  Rough Set Approach for Overall Performance Improvement of an Unsupervised ANN-Based Pattern Classifier , 2009, J. Adv. Comput. Intell. Intell. Informatics.

[126]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[127]  Fabio Sartori,et al.  Bankruptcy forecasting using case-based reasoning: The CRePERIE approach , 2016, Expert Syst. Appl..

[128]  Dechang Pi,et al.  An Effective Method for Mining Quantitative Association Rules with Clustering Partition in Satellite Telemetry Data , 2014 .

[129]  Alan Liu,et al.  Using Planning and Case-Based Reasoning for Service Composition , 2010, J. Adv. Comput. Intell. Intell. Informatics.

[130]  Hiroshi Kawakami,et al.  Reinforcement Leaning of Fuzzy Control Rules with Context-Specitic Segmentation of Actions , 2002, J. Adv. Comput. Intell. Intell. Informatics.

[131]  Gwo-Hshiung Tzeng,et al.  Application of Fuzzy Set Theory and DEA Model to Evaluating Production Efficiency for Taipei City Bus Company , 2001, J. Adv. Comput. Intell. Intell. Informatics.

[132]  Ilona Jagielska Using Rough Sets for Practical Feature Selection in a Rough Sets/Neural Network Framework for Knowledge Discovery , 2000, J. Adv. Comput. Intell. Intell. Informatics.

[133]  Xin Yao,et al.  Application of Genetic Algorithm and K-Nearest Neighbour Method in Real World Medical Fraud Detection Problem , 2000, J. Adv. Comput. Intell. Intell. Informatics.

[134]  Gyunyoung Heo,et al.  Development of nuclear forensic models using kernel regression , 2017 .

[135]  Shun Ishizaki,et al.  Neural Network Model for Word Sense Disambiguation Using Up/Down State and Morphoelectrotonic Transform , 2007, J. Adv. Comput. Intell. Intell. Informatics.

[136]  Dimitrios I. Fotiadis,et al.  Mining sequential patterns for protein fold recognition , 2008, J. Biomed. Informatics.

[137]  Ishwarappa,et al.  A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology , 2015 .

[138]  Ying Shen,et al.  Emerging medical informatics with case-based reasoning for aiding clinical decision in multi-agent system , 2015, J. Biomed. Informatics.

[139]  Pengjian Shang,et al.  Forecasting traffic time series with multivariate predicting method , 2016, Appl. Math. Comput..

[140]  Bernard Manderick,et al.  An adaptive rule-based classifier for mining big biological data , 2016, Expert Syst. Appl..

[141]  Hassab Elgawi Osman Averaging Forest for Online Vision , 2009, J. Adv. Comput. Intell. Intell. Informatics.

[142]  S. Tomlinson Novel approaches to the calculation and comparison of thermoregulatory parameters: Non-linear regression of metabolic rate and evaporative water loss in Australian rodents. , 2016, Journal of thermal biology.

[143]  M. Kafatos,et al.  Interannual Variability of Vegetation in the United States and Its Relation to El Niño/Southern Oscillation , 2000 .

[144]  Satoru Miyano,et al.  Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data. , 2004 .

[145]  Wei Jin-Mao,et al.  Novel Approach to Decision-Tree Construction , 2004 .

[146]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[147]  Anthony C. Atkinson,et al.  On robust linear regression with incomplete data , 2000 .