A brief comparative study of the potentialities and limitations of machine-learning algorithms and statistical techniques

Machine learning is a popular way to find patterns and relationships in high complex datasets. With the nowadays advancements in storage and computational capabilities, some machine-learning techniques are becoming suitable for real-world applications. The aim of this work is to conduct a comparative analysis of machine learning algorithms and conventional statistical techniques. These methods have long been used for clustering large amounts of data and extracting knowledge in a wide variety of science fields. However, the central knowledge of the different methods and their specific requirements for the data set, as well as the limitations of the individual methods, are an obstacle for the correct use of these methods. New machine learning algorithms could be integrated even more strongly into the current evaluation if the right choice of methods were easier to make. In the present work, some different algorithms of machine learning are listed. Four methods (artificial neural network, regression method, self-organizing map, k-means al-algorithm) are compared in detail and possible selection criteria are pointed out. Finally, an estimation of the fields of work and application and possible limitations are provided, which should help to make choices for specific interdisciplinary analyses.

[1]  Yong Shi,et al.  A review of data-driven approaches for prediction and classification of building energy consumption , 2018 .

[2]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[3]  Marian Verhelst,et al.  A Review on Internet of Things Solutions for Intelligent Energy Control in Buildings for Smart City Applications , 2017 .

[4]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[5]  Shiori Kuramoto,et al.  Visualization of topographical internal representation of learning robots , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[6]  Fernando Bação,et al.  Self-organizing Maps as Substitutes for K-Means Clustering , 2005, International Conference on Computational Science.

[7]  M. F. Ghazali,et al.  A review on the application of response surface method and artificial neural network in engine performance and exhaust emissions characteristics in alternative fuel , 2018 .

[8]  R. H. Myers Classical and modern regression with applications , 1986 .

[9]  R. Rastogi,et al.  CURE: An Efficient Clustering Algorithm for Large Databases , 1998, SIGMOD Conference.

[10]  J V Tu,et al.  Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. , 1996, Journal of clinical epidemiology.

[11]  Theofilos A. Papadopoulos,et al.  Pattern recognition algorithms for electricity load curve analysis of buildings , 2014 .

[12]  Robert J. Kauffman,et al.  Consumer Informedness and Firm Information Strategy , 2013, Inf. Syst. Res..

[13]  Alán Aspuru-Guzik,et al.  Accelerating the discovery of materials for clean energy in the era of smart automation , 2018, Nature Reviews Materials.

[14]  Birgitta Dresp-Langley,et al.  The quantization error in a Self-Organizing Map as a contrast and colour specific indicator of single-pixel change in large random patterns , 2019, Neural Networks.

[15]  David West,et al.  A comparison of SOM neural network and hierarchical clustering methods , 1996 .

[16]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[17]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[18]  A. V. Boikov,et al.  Evaluation of bulk material behavior control method in technological units using DEM. Part 1 , 2020 .

[19]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[20]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[21]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  David J. Hand,et al.  Mixture Models: Inference and Applications to Clustering , 1989 .

[24]  Sylvain Robert,et al.  State of the art in building modelling and energy performances prediction: A review , 2013 .

[25]  June-Goo Lee,et al.  Deep Learning in Medical Imaging: General Overview , 2017, Korean journal of radiology.

[26]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[27]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[28]  H. Wold,et al.  On Prediction in Stationary Time Series , 1948 .

[29]  Danuta Szpilko,et al.  Smart city concept in the light of the literature review , 2019, Engineering Management in Production and Services.

[30]  Geoffrey E. Hinton,et al.  How neural networks learn from experience. , 1992, Scientific American.

[31]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[32]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[33]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[34]  Teuvo Kohonen,et al.  Physiological interpretationm of the self-organizing map algorithm , 1993 .

[35]  F. Kujur,et al.  Emotions as predictor for consumer engagement in YouTube advertisement , 2018 .

[36]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[37]  G. David Garson,et al.  Interpreting neural-network connection weights , 1991 .

[38]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[39]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[40]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[41]  Klaus-Dieter Thoben,et al.  "Industrie 4.0" and Smart Manufacturing - A Review of Research Issues and Application Examples , 2017, Int. J. Autom. Technol..

[42]  Himer Avila-George,et al.  Comparison between artificial neural network and partial least squares regression models for hardness modeling during the ripening process of Swiss-type cheese using spectral profiles , 2018 .

[43]  Amir Masoud Rahmani,et al.  Internet of Things applications: A systematic review , 2019, Comput. Networks.

[44]  S. Voss,et al.  Laminar burning velocities of low calorific and hydrogen containing fuel blends , 2017 .

[45]  M. Ankerst,et al.  OPTICS: ordering points to identify the clustering structure , 1999, ACM SIGMOD Conference.

[46]  D. Basak,et al.  Support Vector Regression , 2008 .

[47]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[48]  Feiping Nie,et al.  Large-Scale Cross-Language Web Page Classification via Dual Knowledge Transfer Using Fast Nonnegative Matrix Trifactorization , 2015, ACM Trans. Knowl. Discov. Data.

[49]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[50]  Niels G. Waller,et al.  A comparison of the classification capabilities of the 1-dimensional kohonen neural network with two pratitioning and three hierarchical cluster analysis algorithms , 1998 .

[51]  Sang-Hoon Kim,et al.  A quantile regression approach to gaining insights for reacquition of defected customers , 2020 .

[52]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[53]  Joachim Denzler,et al.  Deep learning and process understanding for data-driven Earth system science , 2019, Nature.

[54]  Zeyu Wang,et al.  A review of artificial intelligence based building energy use prediction: Contrasting the capabilities of single and ensemble prediction models , 2017 .

[55]  Barry A. Wray,et al.  Determinants of relationship quality: An artificial neural network analysis , 1996 .

[56]  Russell G. Death,et al.  An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data , 2004 .

[57]  Melody Y. Kiang,et al.  An Evaluation of Self-Organizing Map Networks as a Robust Alternative to Factor Analysis in Data Mining Applications , 2001, Inf. Syst. Res..

[58]  Soteris A. Kalogirou,et al.  Machine learning methods for solar radiation forecasting: A review , 2017 .

[59]  David Flynn,et al.  Artificial intelligence and machine learning approaches to energy demand-side response: A systematic review , 2020, Renewable and Sustainable Energy Reviews.

[60]  Gregory Vial,et al.  Understanding digital transformation: A review and a research agenda , 2019, J. Strateg. Inf. Syst..

[61]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[62]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[63]  L Leinonen,et al.  Self-organized acoustic feature map in detection of fricative-vowel coarticulation. , 1993, The Journal of the Acoustical Society of America.

[64]  Chin-Teng Lin,et al.  A review of clustering techniques and developments , 2017, Neurocomputing.

[65]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[66]  Virgilio Cruz-Machado,et al.  Scanning the Industry 4.0: A Literature Review on Technologies for Manufacturing Systems , 2019, Engineering Science and Technology, an International Journal.

[67]  Simone Santini,et al.  Three-dimensional planar-faced object classification with Kohonen maps , 1993 .

[68]  Puneet Agrawal,et al.  Understanding Emotions in Text Using Deep Learning and Big Data , 2019, Comput. Hum. Behav..

[69]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[70]  Kenji Suzuki,et al.  Overview of deep learning in medical imaging , 2017, Radiological Physics and Technology.

[71]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[72]  Klaus Schulten,et al.  Implementation of self-organizing neural networks for visuo-motor control of an industrial robot , 1993, IEEE Trans. Neural Networks.

[73]  Ashish Dutta,et al.  Kinematics-based end-effector path control of a mobile manipulator system on an uneven terrain using a two-stage Support Vector Machine , 2019, Robotica.

[74]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[75]  B. B. Zaidan,et al.  A review of smart home applications based on Internet of Things , 2017, J. Netw. Comput. Appl..

[76]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[77]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[78]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[79]  Olaf Sporns,et al.  Complex network measures of brain connectivity: Uses and interpretations , 2010, NeuroImage.

[80]  John Mwangi Wandeto,et al.  The quantization error in a Self-Organizing Map as a contrast and colour specific indicator of single-pixel change in large random patterns , 2019, Neural Networks.

[81]  Marlene Amorim,et al.  Digital Transformation: A Literature Review and Guidelines for Future Research , 2018, WorldCIST.

[82]  Hadi Salehi,et al.  Emerging artificial intelligence methods in structural engineering , 2018, Engineering Structures.

[83]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[84]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  M. Javaid,et al.  Artificial Intelligence (AI) applications for COVID-19 pandemic , 2020, Diabetes & Metabolic Syndrome: Clinical Research & Reviews.