Algorithms and software for data mining and machine learning: a critical comparative view from a systematic review of the literature

Today, a greater generation of information is produced as a consequence of the technological development of society. The Internet has facilitated the access and extraction of this information, thus pursuing the automatic discovery of the knowledge contained within. In this context, data mining aims to discover patterns, profiles and trends of a large volume of data, for which multiple learning techniques are available. The selection of which technique to use depends on the type of result desired to obtain and the data that are available, considering that the algorithms for these tasks date mostly from the early twentieth century and are now the basis of these new technologies. The aim of this study is to show the development of these techniques in the field of scientific research and to present the evolution of algorithms and software for data mining in recent years. To this end, the systematic literature review methodology was applied, as it is considered a systematic process that identifies, evaluates, and interprets the work of researchers in a chosen field. As a result, we present a comparative analysis of the most outstanding software: Alteryx, TIBCO Data Science, RapidMiner and WEKA, their capacities for data mining processes and a description of the algorithms and techniques of machine learning that are currently on the rise.

[1]  Hind R'bigui,et al.  The state-of-the-art of business process mining challenges , 2017, Int. J. Bus. Process. Integr. Manag..

[2]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[3]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[4]  Simon Fong,et al.  DBSCAN: Past, present and future , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[5]  M. Petticrew,et al.  Systematic Reviews in the Social Sciences: A Practical Guide , 2005 .

[6]  Ángel Freddy Godoy Viera Técnicas de aprendizaje de máquina utilizadas para la minería de texto , 2017 .

[7]  Yaoqin Xie,et al.  A Technical Review of Convolutional Neural Network-Based Mammographic Breast Cancer Diagnosis , 2019, Comput. Math. Methods Medicine.

[8]  Francisco Charte,et al.  Subgroup Discovery with Evolutionary Fuzzy Systems in R: The SDEFSR Package , 2016, R J..

[9]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[10]  Fei-Yue Wang,et al.  Generative adversarial networks: introduction and outlook , 2017, IEEE/CAA Journal of Automatica Sinica.

[11]  Manuel Filipe Santos,et al.  KDD, SEMMA and CRISP-DM: a parallel overview , 2008, IADIS European Conf. Data Mining.

[12]  Yajuan Li,et al.  Feature Extraction and Learning Effect Analysis for MOOCs Users Based on Data Mining , 2018, Int. J. Emerg. Technol. Learn..

[13]  Tao Lei,et al.  A review of Convolutional-Neural-Network-based action recognition , 2019, Pattern Recognit. Lett..

[14]  Johannes De Smedt,et al.  Dropout Prediction in MOOCs: A Comparison Between Process and Sequence Mining , 2017, Business Process Management Workshops.

[15]  Francisco Javier González-Castaño,et al.  Unsupervised method for sentiment analysis in online texts , 2016, Expert Syst. Appl..

[16]  Masashi Sugiyama,et al.  Active deep Q-learning with demonstration , 2018, Machine Learning.

[17]  Plamen P. Angelov,et al.  A new evolving clustering algorithm for online data streams , 2016, 2016 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[18]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[19]  Hiroshi Mineno,et al.  Contextual Outlier Detection in Sensor Data Using Minimum Spanning Tree Based Clustering , 2018, 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2).

[20]  Jon Atle Gulla,et al.  Dynamic attention-integrated neural network for session-based news recommendation , 2019, Machine Learning.

[21]  Francisco José García-Peñalvo,et al.  Aprendizaje, Innovación y Competitividad: La Sociedad del Aprendizaje , 2017 .

[22]  Tao Liu,et al.  Unsupervised change detection for remote sensing images based on object-based MRF and stacked autoencoders , 2016, 2016 International Conference on Orange Technologies (ICOT).

[23]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[24]  Anagha N. Chaudhari,et al.  Expert system for retrieval of documents using evolutionary approaches incorporating clustering , 2017, 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA).

[25]  Yina Suo,et al.  Application of Clustering Analysis in Brain Gene Data Based on Deep Learning , 2019, IEEE Access.

[26]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[27]  Yong Liu,et al.  Improved Recurrent Neural Networks for Session-based Recommendations , 2016, DLRS@RecSys.

[28]  Shiqiang Du,et al.  Manifold regularized robust unsupervised feature selection for image clustering , 2017, 2017 36th Chinese Control Conference (CCC).

[29]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[30]  Neha Sharma,et al.  An Analysis Of Convolutional Neural Networks For Image Classification , 2018 .

[31]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[32]  P. Sharon Femi,et al.  Comparative Study of Outlier Detection Approaches , 2018, 2018 International Conference on Inventive Research in Computing Applications (ICIRCA).

[33]  Evangelos Simoudis,et al.  Reality Check for Data Mining , 1996, IEEE Expert.

[34]  N. Venugopal Sample Selection Based Change Detection with Dilated Network Learning in Remote Sensing Images , 2019 .

[35]  V Sumalatha,et al.  An Improved Bayes Classification Approach to Reduce Affliction of Juvenile , 2018, 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[36]  Vishnu B. Raj,et al.  Review on Generative Adversarial Networks , 2020, 2020 International Conference on Communication and Signal Processing (ICCSP).

[37]  Rania Hodhod,et al.  Sentiment Analysis of Social Media Networks Using Machine Learning , 2018, 2018 14th International Computer Engineering Conference (ICENCO).

[38]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[39]  Akhilesh Tiwari,et al.  Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data , 2016 .

[40]  F. Peralta Proceso de Conceptualización del Entendimiento del Negocio para Proyectos de Explotación de Información , 2014 .

[41]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[42]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[43]  Jesús Alcalá-Fdez,et al.  Evolutionary data mining and applications: A revision on the most cited papers from the last 10 years (2007–2017) , 2018, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..