An Imbalanced Learning based MDR-TB Early Warning System

As a man-made disease, multidrug-resistant tuberculosis (MDR-TB) is mainly caused by improper treatment programs and poor patient supervision, most of which could be prevented. According to the daily treatment and inspection records of tuberculosis (TB) cases, this study focuses on establishing a warning system which could early evaluate the risk of TB patients converting to MDR-TB using machine learning methods. Different imbalanced sampling strategies and classification methods were compared due to the disparity between the number of TB cases and MDR-TB cases in historical data. The final results show that the relative optimal predictions results can be obtained by adopting CART-USBagg classification model in the first 90 days of half of a standardized treatment process.

[1]  E. Kurbatova,et al.  Prevalence of and risk factors for resistance to second-line drugs in people with multidrug-resistant tuberculosis in eight countries: a prospective cohort study , 2012, The Lancet.

[2]  Taghi M. Khoshgoftaar,et al.  Random forest: A reliable tool for patient response prediction , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[3]  Bo Tang,et al.  KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[4]  Coskun Bayrak,et al.  Comparison of AI Techniques for Prediction of Liver Fibrosis in Hepatitis Patients , 2013, Journal of Medical Systems.

[5]  Abdülkadir Cakır,et al.  A Software Tool for Determination of Breast Cancer Treatment Methods Using Data Mining Approach , 2011, Journal of Medical Systems.

[6]  Bo Tang,et al.  ENN: Extended Nearest Neighbor Method for Pattern Recognition [Research Frontier] , 2015, IEEE Computational Intelligence Magazine.

[7]  Tian-Yu Liu,et al.  EasyEnsemble and Feature Selection for Imbalance Data Sets , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[8]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Albert Balows,et al.  Infectious Disease Epidemiology: Theory and Practice , 2007 .

[10]  Jean-Marie Aerts,et al.  Prediction of Clinical Conditions after Coronary Bypass Surgery using Dynamic Data Analysis , 2010, Journal of Medical Systems.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Yekta Ülgen,et al.  Classification of Juvenile Myoclonic Epilepsy Data Acquired Through Scanning Electromyography with Machine Learning Algorithms , 2012, Journal of Medical Systems.

[13]  Mohamed Waleed Fakhr,et al.  Informed Under-Sampling for Enhancing Patient Specific Epileptic Seizure Detection , 2012 .

[14]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[15]  Pedro M Alzari,et al.  Rising standards for tuberculosis drug development. , 2008, Trends in pharmacological sciences.

[16]  Akin Özçift,et al.  SVM Feature Selection Based Rotation Forest Ensemble Classifiers to Improve Computer-Aided Diagnosis of Parkinson Disease , 2011, Journal of Medical Systems.

[17]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[18]  Erdal Dinç,et al.  Linear regression analysis and its application to the multivariate spectral calibrations for the multiresolution of a ternary mixture of caffeine, paracetamol and metamizol in tablets. , 2003, Journal of pharmaceutical and biomedical analysis.

[19]  Jhi-Joung Wang,et al.  Do ePortfolios Contribute to Learners’ Reflective Thinking Activities? : A Preliminary Study of Nursing Staff Users , 2015, Journal of Medical Systems.

[20]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[21]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[22]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[23]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[24]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  Friedhelm Schwenker,et al.  Ensemble Methods: Foundations and Algorithms [Book Review] , 2013, IEEE Computational Intelligence Magazine.

[26]  Xiao-Hua Zhou,et al.  Research Paper: Using Computer-based Medical Records to Predict Mortality Risk for Inner-city Patients with Reactive Airways Disease , 1997, J. Am. Medical Informatics Assoc..

[27]  Joel J. P. C. Rodrigues,et al.  Breast Alert: An On-line Tool for Predicting the Lifetime Risk of Women Breast Cancer , 2010, Journal of Medical Systems.

[28]  Daniel Krewski,et al.  Comparison of time series and case-crossover analyses of air pollution and hospital admission data. , 2003, International journal of epidemiology.

[29]  Y. Chan Biostatistics 201: linear regression analysis. , 2004, Singapore medical journal.

[30]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[31]  You-Shyang Chen,et al.  A Machine Learning Method for Power Prediction on the Mobile Devices , 2015, Journal of Medical Systems.

[32]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[33]  Ronald L. Rivest,et al.  On Estimating the Size and Confidence of a Statistical Audit , 2007, EVT.

[34]  George C. Runger,et al.  Bias of Importance Measures for Multi-valued Attributes and Solutions , 2011, ICANN.

[35]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[36]  R. Lewis An Introduction to Classification and Regression Tree (CART) Analysis , 2000 .

[37]  Fevzullah Temurtas,et al.  Tuberculosis Disease Diagnosis Using Artificial Neural Networks , 2010, Journal of Medical Systems.

[38]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[39]  John J. Chen Communicating complex information: the interpretation of statistical interaction in multiple logistic regression analysis. , 2003, American journal of public health.

[40]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[41]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[42]  Chen YanYan,et al.  Application of grey model to forecast incidence trend of intestinal infectious diseases , 2009 .

[43]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[44]  J. Stanton,et al.  A lengthy look at the daily grind: time series analysis of events, mood, stress, and satisfaction. , 2003, The Journal of applied psychology.

[45]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[46]  M. Mostafizur Rahman,et al.  Addressing the Class Imbalance Problem in Medical Datasets , 2013 .

[47]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[48]  Bin Wang,et al.  Three-Dimensional SVM with Latent Variable: Application for Detection of Lung Lesions in CT Images , 2014, Journal of Medical Systems.

[49]  Ali Serhan Koyuncugil,et al.  Early Warning System for Financially Distressed Hospitals Via Data Mining Application , 2012, Journal of Medical Systems.