A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Machine learning is as growing as fast as concepts such as Big data and the field of data science in general. The purpose of the systematic review was to analyze scholarly articles that were published between 2015 and 2018 addressing or implementing supervised and unsupervised machine learning techniques in different problem-solving paradigms. Using the elements of PRISMA, the review process identified 84 scholarly articles that had been published in different journals. Of the 84 articles, 6 were published before 2015 despite their metadata indicating that they were published in 2015. The existence of the six articles in the final papers was attributed to errors in indexing. Nonetheless, from the reviewed papers, decision tree, support vector machine, and Naive Bayes algorithms appeared to be the most cited, discussed, and implemented supervised learners. Conversely, k-means, hierarchical clustering, and principal component analysis also emerged as the commonly used unsupervised learners. The review also revealed other commonly used algorithms that include ensembles and reinforce learners, and future systematic reviews can focus on them because of the developments that machine learning and data science is undergoing at the moment.

[1]  Y. Saeys,et al.  Computational flow cytometry: helping to make sense of high-dimensional immunology data , 2016, Nature Reviews Immunology.

[2]  Sara de Freitas,et al.  Exploratory Analysis in Learning Analytics , 2015, Technology, Knowledge and Learning.

[3]  Khader M. Hasan,et al.  Identification and individualized prediction of clinical phenotypes in bipolar disorders using neurocognitive data, neuroimaging scans and machine learning , 2017, NeuroImage.

[4]  Ruben Verborgh,et al.  Challenges as enablers for high quality Linked Data: insights from the Semantic Publishing Challenge , 2017, PeerJ Comput. Sci..

[5]  Jie Tan,et al.  Cross-platform normalization of microarray and RNA-seq data for machine learning applications , 2016, PeerJ.

[6]  Hwa Jen Yap,et al.  Integrative machine learning analysis of multiple gene expression profiles in cervical cancer , 2018, PeerJ.

[7]  Jacob biamonte,et al.  Quantum machine learning , 2016, Nature.

[8]  C. Krittanawong,et al.  Artificial Intelligence in Precision Cardiovascular Medicine. , 2017, Journal of the American College of Cardiology.

[9]  Richard A. Bauder,et al.  A survey on the state of healthcare upcoding fraud analysis and detection , 2017, Health Services and Outcomes Research Methodology.

[10]  Laurent Gatto,et al.  A Bioconductor workflow for processing and analysing spatial proteomics data. , 2016, F1000Research.

[11]  Ryosuke Shibasaki,et al.  Identification of Village Building via Google Earth Images and Supervised Machine Learning Methods , 2016, Remote. Sens..

[12]  Constantin F. Aliferis,et al.  Medical decision support using machine learning for early detection of late-onset neonatal sepsis , 2014, J. Am. Medical Informatics Assoc..

[13]  J. C. Retamal,et al.  Multiqubit and multilevel quantum reinforcement learning with quantum technologies , 2017, PloS one.

[14]  Cheng Wu,et al.  Semi-Supervised and Unsupervised Extreme Learning Machines , 2014, IEEE Transactions on Cybernetics.

[15]  G Shanmugasundaram.,et al.  An Investigation on IoT Healthcare Analytics , 2017 .

[16]  Taghi M. Khoshgoftaar,et al.  Survey of review spam detection using machine learning techniques , 2015, Journal of Big Data.

[17]  Mark J. Clement,et al.  Detecting false positive sequence homology: a machine learning approach , 2016, BMC Bioinformatics.

[18]  James E. Dobson,et al.  Can An Algorithm Be Disturbed?: Machine Learning, Intrinsic Criticism, and the Digital Humanities , 2015 .

[19]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[20]  Goran Nenadic,et al.  Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives , 2013, J. Am. Medical Informatics Assoc..

[21]  Abdul Hanan Abdullah,et al.  A Critical Review for Developing Accurate and Dynamic Predictive Models Using Machine Learning Methods in Medicine and Health Care , 2017, Journal of Medical Systems.

[22]  K. V. Rudakov,et al.  On the theoretical basis of metric analysis of poorly formalized problems of recognition and classification , 2015, Pattern Recognition and Image Analysis.

[23]  Mohammed Anbar,et al.  A Preliminary Performance Evaluation of K-means, KNN and EM Unsupervised Machine Learning Methods for Network Flow Classification , 2016 .

[24]  Jianjun Hu,et al.  Semi-Supervised Approach to Phase Identification from Combinatorial Sample Diffraction Patterns , 2016 .

[25]  Guanhua Chen,et al.  Calibration drift in regression and machine learning models for acute kidney injury , 2017, J. Am. Medical Informatics Assoc..

[26]  Anita Alicante,et al.  Unsupervised entity and relation extraction from clinical records in Italian , 2016, Comput. Biol. Medicine.

[27]  Allan Melvin Andrew,et al.  Pollutant Recognition Based on Supervised Machine Learning for Indoor Air Quality Monitoring Systems , 2017 .

[28]  Fengmao Lv,et al.  An Effective Conversation-Based Botnet Detection Method , 2017 .

[29]  J. Caudron,et al.  Measurement of the Drell-Yan triple-differential cross section in pp collisions at s=8$$ \sqrt{s}=8 $$ TeV , 2017, 1710.05167.

[30]  Ram Gopal Raj,et al.  A systematic literature review on opinion types and sentiment analysis techniques: Tasks and challenges , 2017, Internet Res..

[31]  Enrico Gratton,et al.  Supervised Machine Learning for Classification of the Electrophysiological Effects of Chronotropic Drugs on Human Induced Pluripotent Stem Cell-Derived Cardiomyocytes , 2015, PloS one.

[32]  Kevin G. Stanley,et al.  A glossary for big data in population and public health: discussion and commentary on terminology and research methods , 2017, Journal of Epidemiology & Community Health.

[33]  Harlan M Krumholz,et al.  Describing the performance of U.S. hospitals by applying big data analytics , 2017, PloS one.

[34]  Ivo D. Dinov,et al.  Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data , 2016, GigaScience.

[35]  I. Olkin,et al.  Meta-analysis of observational studies in epidemiology - A proposal for reporting , 2000 .

[36]  P. F. Vasconcelos,et al.  In situ immune response and mechanisms of cell damage in central nervous system of fatal cases microcephaly by Zika virus , 2018, Scientific Reports.

[37]  Reshma Rastogi,et al.  Tree-based localized fuzzy twin support vector clustering with square loss function , 2017, Applied Intelligence.

[38]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[39]  Andreas Henschel,et al.  Taxonomy-aware feature engineering for microbiome classification , 2018, BMC Bioinformatics.

[40]  Bessam Abdulrazak,et al.  Ambient Technology to Assist Elderly People in Indoor Risks , 2016, Comput..

[41]  Hui Liu,et al.  Aquatic Toxic Analysis by Monitoring Fish Behavior Using Computer Vision: A Recent Progress , 2018, Journal of toxicology.

[42]  Damian Trilling,et al.  Automatische inhoudsanalyse van Nederlandstalige data : Een overzicht en onderzoeksagenda , 2018 .

[43]  Wilfried Haerty,et al.  The evolutionary dynamics of microRNAs in domestic mammals , 2018, Scientific Reports.

[44]  Kristian Thorlund,et al.  The PRISMA Extension Statement for Reporting of Systematic Reviews Incorporating Network Meta-analyses of Health Care Interventions: Checklist and Explanations , 2015, Annals of Internal Medicine.

[45]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[46]  Mohamed-Slim Alouini,et al.  Instantly decodable network coding for real-time device-to-device communications , 2016, EURASIP J. Adv. Signal Process..

[47]  P. Shekelle,et al.  Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation , 2015, BMJ : British Medical Journal.

[48]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[49]  Declan O'Sullivan,et al.  Machine learning as a service for enabling Internet of Things and People , 2016, Personal and Ubiquitous Computing.

[50]  Hojjat Adeli,et al.  Supervised Deep Restricted Boltzmann Machine for Estimation of Concrete , 2017 .

[51]  Muhammad Anwarul Azim,et al.  Text to Emotion Extraction Using Supervised Machine Learning Techniques , 2018, TELKOMNIKA (Telecommunication Computing Electronics and Control).

[52]  Praminda Caleb-Solly,et al.  Unsupervised Machine Learning for Developing Personalised Behaviour Models Using Activity Data , 2017, Sensors.

[53]  Arianna Mencattini,et al.  Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech? , 2016, PloS one.

[54]  B. MacArthur,et al.  Classification of Paediatric Inflammatory Bowel Disease using Machine Learning , 2017, Scientific Reports.

[55]  Kuteesa R. Bisaso,et al.  A survey of machine learning applications in HIV clinical research and care , 2017, Comput. Biol. Medicine.

[56]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[57]  Tejenderkaur Harlalsingh Sandhu,et al.  MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING – A REVIEW , 2018 .

[58]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[59]  Mustafa Kaytan,et al.  A review on machine learning tools , 2017, 2017 International Artificial Intelligence and Data Processing Symposium (IDAP).

[60]  Xuan Dau Hoang,et al.  Botnet Detection Based On Machine Learning Techniques Using DNS Query Data , 2018, Future Internet.

[61]  Hugo Gamboa,et al.  Machine learning for the meta-analyses of microbial pathogens’ volatile signatures , 2018, Scientific Reports.

[62]  Jianhua Zhao,et al.  Semi-supervised Online Multiple Kernel Learning Algorithm for Big Data , 2016 .

[63]  Marc Aerts,et al.  Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA , 2018, EFSA Supporting Publications.

[64]  M. Ghazisaeedi,et al.  Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review , 2017, Iranian journal of public health.

[65]  G. Bianconi,et al.  Machine learning meets complex networks via coalescent embedding in the hyperbolic space , 2016, Nature Communications.

[66]  Ashutosh Kumar Singh,et al.  Comprehensive Literature Review on Machine Learning Structures for Web Spam Classification , 2015 .

[67]  Upasna Chandarana Kothari,et al.  Machine Learning: A Novel Approach to Predicting Slope Instabilities , 2018 .

[68]  Miroslava Cuperlovic-Culf,et al.  Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling , 2018, Metabolites.

[69]  S. Huber,et al.  Learning phase transitions by confusion , 2016, Nature Physics.

[70]  Diego Perugini,et al.  Solving petrological problems through machine learning: the study case of tectonic discrimination using geochemical and isotopic data , 2016, Contributions to Mineralogy and Petrology.

[71]  C. Pittenger,et al.  Meta-analysis of the symptom structure of obsessive-compulsive disorder. , 2008, The American journal of psychiatry.

[72]  Mansour Ebrahimi,et al.  Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations , 2015, PloS one.

[73]  Hashem Koohy,et al.  The rise and fall of machine learning methods in biomedical research , 2017, F1000Research.

[74]  Xia Li,et al.  Research and applications: Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy , 2013, J. Am. Medical Informatics Assoc..

[75]  T Karthick,et al.  A Novel Study of Machine Learning Algorithms for Classifying Health Care Data , 2017 .

[76]  Shu-Cherng Fang,et al.  A kernel-free quadratic surface support vector machine for semi-supervised learning , 2016, J. Oper. Res. Soc..

[77]  Seth Lloyd,et al.  Quantum algorithms for topological and geometric analysis of data , 2016, Nature Communications.

[78]  Neil R. Smalheiser,et al.  Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach , 2017, J. Am. Medical Informatics Assoc..

[79]  Steven Bethard,et al.  Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning , 2016, J. Am. Medical Informatics Assoc..

[80]  Taufik Djatna,et al.  Cluster Analysis for SME Risk Analysis Documents Based on Pillar K-Means , 2016 .

[81]  Davide Ascoli,et al.  Inter-annual and decadal changes in teleconnections drive continental-scale synchronization of tree reproduction , 2017, Nature Communications.

[82]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[83]  Dmitri Krioukov,et al.  Machine learning in the string landscape , 2017, Journal of High Energy Physics.

[84]  Jon D. Patrick,et al.  Research and applications: Supervised machine learning and active learning in classification of radiology reports , 2014, J. Am. Medical Informatics Assoc..

[85]  Vibha Anand,et al.  Patient-tailored prioritization for a pediatric care decision support system through machine learning. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[86]  Stephen Thaler,et al.  Digital Family History Data Mining with Neural Networks: A Pilot Study. , 2016, Perspectives in health information management.

[87]  Hien Nguyen,et al.  From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system , 2014, J. Am. Medical Informatics Assoc..

[88]  D. Pavithra,et al.  A STUDY ON MACHINE LEARNING ALGORITHM IN MEDICAL DIAGNOSIS , 2018, International Journal of Advanced Research in Computer Science.

[89]  Dharmendra Lal Gupta,et al.  Deep Machine Learning and Neural Networks: An Overview , 2017 .

[90]  Sabina-Cristiana Necula Deep Learning for Distribution Channels' Management , 2017 .

[91]  Chip M. Lynch,et al.  Application of unsupervised analysis techniques to lung cancer patient data , 2017, PloS one.

[92]  Ian Yohai,et al.  Using Quantitative Methods in Industry , 2016, PS: Political Science & Politics.

[93]  Fulvio Laus,et al.  Effects of Single-Dose Prucalopride on Intestinal Hypomotility in Horses: Preliminary Observations , 2017, Scientific Reports.

[94]  R. Rattan,et al.  Predicting central line‐associated bloodstream infections and mortality using supervised machine learning , 2018, Journal of critical care.

[95]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[96]  David Moher,et al.  PRISMA harms checklist: improving harms reporting in systematic reviews , 2016, British Medical Journal.

[97]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[98]  Klaus-Dieter Thoben,et al.  Changing States of Multistage Process Chains , 2016 .

[99]  Je-Won Kang,et al.  Intrusion Detection System Using Deep Neural Network for In-Vehicle Network Security , 2016, PloS one.

[100]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[101]  T. Rooney,et al.  Changes in magma storage conditions following caldera collapse at Okataina Volcanic Center, New Zealand , 2015, Contributions to Mineralogy and Petrology.

[102]  Hana Al-Nuaim,et al.  Comparison of Four SVM Classifiers Used with Depth Sensors to Recognize Arabic Sign Language Words , 2017, Comput..

[103]  Chuanjun Zhao,et al.  Determining Fuzzy Membership for Sentiment Classification: A Three-Layer Sentiment Propagation Model , 2016, PloS one.

[104]  Pablo Gamallo,et al.  A lexicon based method to search for extreme opinions , 2018, PloS one.

[105]  Poonam Choudhari,et al.  Sentiment Analysis and Machine Learning Based Sentiment Classification: A Review , 2017 .

[106]  P. Shekelle,et al.  Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement , 2015, Systematic Reviews.

[107]  V. Jaiganesh,et al.  A Literature Review on Supervised Machine Learning Algorithms and Boosting Process , 2017 .