Evidence quality estimation using selected machine learning approaches

Evidence Based Medicine, is a practice, where medical actions/decisions are undertaken on the basis of best available evidence-based recommendations. In this context, we propose a system for automatic grading of evidence.Evidence grading is approached as a multi-label classification task. Here, classes represent grades, in a widely used Strength of Recommendation Taxonomy (SORT). Numerous ensemble methods are experimented with. It was found that the most successful one used Support Vector Classifiers, trained on multiple high level features, results of which are used to train a Random Forest Classifier. The best achieved accuracy score was 75.41%, which is a significant improvement over the baseline of 48%, achieved by classifying all instances as the majority class. It was also found that the most important predictor is the publication type of articles comprising the body of evidence. The designed system is tuned for use with medical publications and SORT. However, due to it’s generality, it can easily be used with other evidence grading systems.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[3]  Maria Ganzha,et al.  Combining information from multiple search engines - Preliminary comparison , 2010, Inf. Sci..

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[7]  D. Sackett,et al.  Evidence based medicine: what it is and what it isn't , 1996, BMJ.

[8]  Jianguo Jiang,et al.  Using Multi-features and Ensemble Learning Method for Imbalanced Malware Classification , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[9]  M. Ebell,et al.  Strength of recommendation taxonomy (SORT): a patient-centered approach to grading evidence in the medical literature. , 2004, The Journal of the American Board of Family Practice.

[10]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[11]  Anne M. P. Canuto,et al.  Analyzing the Benefits of Using a Fuzzy-Neuro Model in the Accuracy of the NeurAge System: an Agent-Based System for Classification Tasks , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[12]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[16]  D. Sackett Evidence-Based Medicine: How to Practice and Teach EBM , 2018 .

[17]  Diego Mollá Aliod,et al.  Automatic Grading of Evidence: the 2011 ALTA Shared Task , 2011, ALTA.

[18]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[19]  Yassine Benajiba,et al.  Grading the Quality of Medical Evidence , 2012, BioNLP@HLT-NAACL.

[20]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[21]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[22]  Diego Mollá Aliod,et al.  Development of a Corpus for Evidence Based Medicine Summarisation , 2011, ALTA.

[23]  Cécile Paris,et al.  Automatic evidence quality prediction to support evidence-based decision making , 2015, Artif. Intell. Medicine.

[24]  G. Golub,et al.  Updating formulae and a pairwise algorithm for computing sample variances , 1979 .

[25]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[26]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..