A mapping study of ensemble classification methods in lung cancer decision support systems

Achieving a high level of classification accuracy in medical datasets is a capital need for researchers to provide effective decision systems to assist doctors in work. In many domains of artificial intelligence, ensemble classification methods are able to improve the performance of single classifiers. This paper reports the state of the art of ensemble classification methods in lung cancer detection. We have performed a systematic mapping study to identify the most interesting papers concerning this topic. A total of 65 papers published between 2000 and 2018 were selected after an automatic search in four digital libraries and a careful selection process. As a result, it was observed that diagnosis was the task most commonly studied; homogeneous ensembles and decision trees were the most frequently adopted for constructing ensembles; and the majority voting rule was the predominant combination rule. Few studies considered the parameter tuning of the techniques used. These findings open several perspectives for researchers to enhance lung cancer research by addressing the identified gaps, such as investigating different classification methods, proposing other heterogeneous ensemble methods, and using new combination rules. Graphical abstract Main features of the mapping study performed in ensemble classification methods applied on lung cancer decision support systems Main features of the mapping study performed in ensemble classification methods applied on lung cancer decision support systems

[1]  Ali Idri,et al.  Systematic Mapping Study of Ensemble Effort Estimation , 2016, ENASE.

[2]  Somsak Choomchuay,et al.  Improved Random Forest (RF) Classifier for Imbalanced Classification of Lung Nodules , 2018, 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST).

[3]  J.A. Macias,et al.  Evolving and assembling functional link networks , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[4]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[5]  Caprice C. Greenberg,et al.  Optimizing Cancer Care Delivery through Implementation Science , 2016, Front. Oncol..

[6]  Xueyan Mei,et al.  Predicting five-year overall survival in patients with non-small cell lung cancer by reliefF algorithm and random forests , 2017, 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC).

[7]  Michael Bauer,et al.  Health Outcome Prediction with Multiple Models and Dempster-Shafer Theory , 2015, 2015 International Conference on Computational Science and Computational Intelligence (CSCI).

[8]  D. Ruta,et al.  An Overview of Classifier Fusion Methods , 2000 .

[9]  Sotiris B. Kotsiantis,et al.  Data preprocessing in predictive data mining , 2019, The Knowledge Engineering Review.

[10]  Alain Abran,et al.  Evaluating filter fuzzy analogy homogenous ensembles for software development effort estimation , 2018, J. Softw. Evol. Process..

[11]  C. Faloutsos,et al.  Ensemble Methods , 2019, Machine Learning with Spark™ and Python®.

[12]  Wenhuang Liu,et al.  Dynamic Weighting Ensembles for Incremental Learning , 2009, 2009 Chinese Conference on Pattern Recognition.

[13]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[14]  Lyle Ungar,et al.  Using machine learning to predict radiation pneumonitis in patients with stage I non-small cell lung cancer treated with stereotactic body radiation therapy , 2016, Physics in medicine and biology.

[15]  Amir-Masoud Eftekhari-Moghadam,et al.  Knowledge discovery in medicine: Current issue and future trend , 2014, Expert Syst. Appl..

[16]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[17]  Fei Su,et al.  Face recognition using SURF features , 2009, International Symposium on Multispectral Image Processing and Pattern Recognition.

[18]  Gunasekaran Manogaran,et al.  A novel Gini index decision tree data mining method with neural network classifiers for prediction of heart disease , 2018, Des. Autom. Embed. Syst..

[19]  Igor Jurisica,et al.  Data mining for case-based reasoning in high-dimensional biological domains , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Ali Idri,et al.  Software Development Effort Estimation Using Feature Selection Techniques , 2018, New Trends in Software Methodologies, Tools and Techniques.

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  Huifang Huang,et al.  Ensemble of support vector machines for heartbeat classification , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[23]  Chee Peng Lim,et al.  An experimental study of original and ordered fuzzy ARTMAP neural networks in pattern classification tasks , 2000, 2000 TENCON Proceedings. Intelligent Systems and Technologies for the New Millennium (Cat. No.00CH37119).

[24]  Issam El-Naqa,et al.  Application of Machine Learning Techniques for Prediction of Radiation Pneumonitis in Lung Cancer Patients , 2009, 2009 International Conference on Machine Learning and Applications.

[25]  Alain Abran,et al.  Improved estimation of software development effort using Classical and Fuzzy Analogy ensembles , 2016, Appl. Soft Comput..

[26]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[27]  Anirban Mukherjee,et al.  Cancer Classification from Gene Expression Data by NPPC Ensemble , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Alain Abran,et al.  On the value of parameter tuning in heterogeneous ensembles effort estimation , 2017, Soft Computing.

[29]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[30]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[31]  James A. Bartholomai,et al.  Prediction of lung cancer patient survival via supervised machine learning classification techniques , 2017, Int. J. Medical Informatics.

[32]  Myungsook Klassen,et al.  Learning Microarray Cancer Datasets by Random Forests and Support Vector Machines , 2010, 2010 5th International Conference on Future Information Technology.

[33]  V. Kučinskas,et al.  The most common technologies and tools for functional genome analysis , 2017, Acta medica Lituanica.

[34]  Guangtao Ge,et al.  Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles , 2008, BMC Bioinformatics.

[35]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[36]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[37]  Reza Javidan,et al.  Predicting lung cancer survivability using ensemble learning methods , 2017, 2017 Intelligent Systems Conference (IntelliSys).

[38]  B. Krawczyk,et al.  Ensemble fusion methods for medical data classification , 2012, 11th Symposium on Neural Network Applications in Electrical Engineering.

[39]  Alain Abran,et al.  Systematic literature review of ensemble effort estimation , 2016, J. Syst. Softw..

[40]  Weidong Xu,et al.  Study on the Infectious Regularity of Patients with Advanced Lung Cancer , 2016, 2016 8th International Conference on Information Technology in Medicine and Education (ITME).

[41]  A. Bezerianos,et al.  An Ensemble Approach for Phenotype Classification Based on Fuzzy Partitioning of Gene Expression Data , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[42]  Jacob D. Furst,et al.  Weak Segmentations and Ensemble Learning to Predict Semantic Ratings of Lung Nodules , 2013, 2013 12th International Conference on Machine Learning and Applications.

[43]  Khin Mo Mo Tun,et al.  AN APPROACH FOR BREAST CANCER DIAGNOSIS CLASSIFICATION USING NEURAL NETWORK , 2015 .

[44]  Kai Petersen,et al.  Systematic Mapping Studies in Software Engineering , 2008, EASE.

[45]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[46]  Hitoshi Iba,et al.  Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  R. Renuka,et al.  On Intuitionistic Fuzzy β-Almost Compactness and β-Nearly Compactness , 2015, TheScientificWorldJournal.

[48]  Y. Alp Aslandogan,et al.  Evidence combination in medical data mining , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[49]  I. Gondal,et al.  Stacked regression ensemble for cancer class prediction , 2005, INDIN '05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005..

[50]  Nilesh V. Patel,et al.  A comprehensive search for expert classification methods in disease diagnosis and prediction , 2018, Expert Syst. J. Knowl. Eng..

[51]  Ali Idri,et al.  Knowledge discovery in cardiology: A systematic literature review , 2017, Int. J. Medical Informatics.

[52]  Joseph O. Deasy,et al.  Decision Fusion of Machine Learning Models to Predict Radiotherapy-Induced Lung Pneumonitis , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[53]  Xin Yao,et al.  Ensemble Learning Using Multi-Objective Evolutionary Algorithms , 2006, J. Math. Model. Algorithms.

[54]  A. Akan,et al.  A novel approach to malignant-benign classification of pulmonary nodules by using ensemble learning classifiers , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[55]  Ali Idri,et al.  Impact of Parameter Tuning on Machine Learning Based Breast Cancer Classification , 2019, WorldCIST.

[56]  Ali Idri,et al.  Systematic mapping study of data mining–based empirical studies in cardiology , 2019, Health Informatics J..

[57]  Yanqing Zhang,et al.  Fuzzy support vector machines for biomedical data analysis , 2005, 2005 IEEE International Conference on Granular Computing.

[58]  Haiyan Hu,et al.  Mining patterns in disease classification forests , 2010, J. Biomed. Informatics.

[59]  Bartosz Krawczyk,et al.  On optimal settings of classification tree ensembles for medical decision support , 2013, Health Informatics J..

[60]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[61]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[62]  Alexander Isaev,et al.  PyEvolve: a toolkit for statistical modelling of molecular evolution , 2004, BMC Bioinformatics.

[63]  Kai Petersen,et al.  Guidelines for conducting systematic mapping studies in software engineering: An update , 2015, Inf. Softw. Technol..

[64]  Fai Wong,et al.  Ensemble learning on heartbeat type classification , 2011, Proceedings 2011 International Conference on System Science and Engineering.

[65]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[66]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[67]  Sebastian Schneckener,et al.  Prediction Errors in Learning Drug Response from Gene Expression Data – Influence of Labeling, Sample Size, and Machine Learning Algorithm , 2013, PloS one.

[68]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[69]  Hua Wang,et al.  Robustness analysis of diversified ensemble decision tree algorithms for Microarray data classification , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[70]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[71]  Michael F. McNitt-Gray,et al.  Automated classification of lung bronchovascular anatomy in CT using AdaBoost , 2007, Medical Image Anal..

[72]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Oludayo O. Olugbara,et al.  Lung Cancer Prediction Using Neural Network Ensemble with Histogram of Oriented Gradient Genomic Features , 2015, TheScientificWorldJournal.

[74]  Jing Li,et al.  A Comparative Study on Machine Classification Model in Lung Cancer Cases Analysis , 2016 .

[75]  Abbas Z. Kouzani,et al.  Lung nodules detection by ensemble classification , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[76]  K. Usha Rani,et al.  ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA , 2012 .

[77]  P. Lambin,et al.  Exploratory Study to Identify Radiomics Classifiers for Lung Cancer Histology , 2016, Front. Oncol..

[78]  Suphakant Phimoltares,et al.  Diagnosis of Heart Disease Using a Mixed Classifier , 2017, 2017 21st International Computer Science and Engineering Conference (ICSEC).

[79]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[80]  Enes Celik,et al.  The mesothelioma disease diagnosis with artificial intelligence methods , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[81]  Alok N. Choudhary,et al.  Lung cancer survival prediction using ensemble data mining on SEER data , 2012, Sci. Program..

[82]  Wang Yong,et al.  A Better Classifier Based on Rough Set and Neural Network for Medical Images , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[83]  R. Anitha,et al.  Ensemble based optimal classification model for pre-diagnosis of lung cancer , 2013, 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT).

[84]  Deepa Abin,et al.  An ensemble approach for cancerious dataset analysis using feature selection , 2015, 2015 Global Conference on Communication Technologies (GCCT).

[85]  Alain Abran,et al.  Investigating heterogeneous ensembles with filter feature selection for software effort estimation , 2017, IWSM-Mensura.

[86]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[87]  P. Chongstitvatana,et al.  A Genetic Programming Ensemble Approach to Cancer Microarray Data Classification , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.

[88]  Ali Idri,et al.  A systematic map of data analytics in breast cancer , 2018, ACSW.

[89]  Amit Kumar,et al.  A Hybrid Predictive Model Integrating C4.5 and Decision Table Classifiers for Medical Data Sets , 2018, J. Inf. Technol. Res..

[90]  Zhen Liu,et al.  A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data , 2017, Neurocomputing.

[91]  L. Tanoue,et al.  Lung cancer: epidemiology, etiology, and prevention. , 2011, Clinics in chest medicine.

[92]  OpitzDavid,et al.  Popular ensemble methods , 1999 .

[93]  Jacob D. Furst,et al.  Building an Ensemble of Probabilistic Classifiers for Lung Nodule Interpretation , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[94]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.