Evaluation and Comparison of Different Machine Learning Methods to Predict Outcome of Tuberculosis Treatment Course

Tuberculosis treatment course completion is crucial to protect patients against prolonged infectiousness, relapse, lengthened and more expensive therapy due to multidrug resistance TB. Up to 50% of all patients do not complete treatment course. To solve this problem, TB treatment with patient supervision and support as an element of the “global plan to stop TB” was considered by the World Health Organization. The plan may require a model to predict the outcome of DOTS therapy; then, this tool may be used to determine how intensive the level of providing services and supports should be. This work applied and compared machine learning techniques initially to predict the outcome of TB therapy. After feature analysis, models by six algorithms including decision tree (DT), artificial neural network (ANN), logistic regression (LR), radial basis function (RBF), Bayesian networks (BN), and support vector machine (SVM) developed and validated. Data of training (N = 4515) and testing (N = 1935) sets were applied and models evaluated by prediction accuracy, F-measure and recall. Seventeen significantly correlated features were identified (P CI = 0.001 - 0.007); DT (C 4.5) was found to be the best algorithm with %74.21 prediction accuracy in comparing with ANN, BN, LR, RBF, and SVM with 62.06%, 57.88%, 57.31%, 53.74%, and 51.36% respectively. Data and distribution may create the opportunity for DT out performance. The predicted class for each TB case might be useful for improving the quality of care through making patients’ supervision and support more case—sensitive in order to enhance the quality of DOTS therapy.

[1]  Stephen R. Marsland,et al.  Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.

[2]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[5]  W. Yew Directly Observed Therapy, Short-Course: The Best Way to Prevent Multidrug- Resistant Tuberculosis , 1999, Chemotherapy.

[6]  ková,et al.  Machine Learning Methods for Knowledge Discovery in Medical Data on Atherosclerosis , 2006 .

[7]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[8]  R. Reves,et al.  Noncompliance with directly observed therapy for tuberculosis. Epidemiology and effect on the outcome of treatment. , 1997, Chest.

[9]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[10]  P. Davies The Role of DOTS in Tuberculosis Treatment and Control , 2003, American journal of respiratory medicine : drugs, devices, and other interventions.

[11]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[12]  K. Floyd,et al.  The Stop TB Strategy: building on and enhancing DOTS to meet the TB-related Millennium Development Goals. , 2006 .

[13]  A. Ndiaye,et al.  Effectiveness of a strategy to improve adherence to tuberculosis treatment in a resource-poor setting: a cluster randomized controlled trial. , 2007, JAMA.

[14]  J. Caylà,et al.  Factors predicting non-completion of tuberculosis treatment among HIV-infected patients in Barcelona (1987-1996). , 2000, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[15]  Snider De,et al.  Enhancing patient compliance with tuberculosis therapy. , 1989, Clinics in chest medicine.

[16]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[17]  Mevlut Ture,et al.  Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease , 2008, Expert Syst. Appl..

[18]  Zhenglu Yang,et al.  Advanced Data Mining , 2013 .

[19]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[20]  Judith Legrand,et al.  Modeling the Impact of Tuberculosis Control Strategies in Highly Endemic Overcrowded Prisons , 2008, PloS one.

[21]  Svetha Venkatesh,et al.  An Application of Machine Learning Techniques for the Classification of Glaucomatous Progression , 2002, SSPR/SPR.

[22]  P. Escalante Tuberculosis , 1904, Annals of Internal Medicine.

[23]  J V Tu,et al.  Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. , 1996, Journal of clinical epidemiology.

[24]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[25]  Richard Goldstein,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival and Repeated Measures Models , 2006, Technometrics.