Deep learning for detecting pulmonary tuberculosis via chest radiography: an international study across 10 countries

Purpose: Tuberculosis (TB) is one of the top 10 causes of death worldwide and disproportionately affects low-to-middle-income-countries. Though the WHO recommends chest radiographs (CXRs) to facilitate TB screening efforts and the means of acquiring CXRs are generally accessible, expertise in CXR interpretation poses a challenge to broad implementation of TB screening efforts in many parts of the world. To help mitigate this challenge, we developed a generalizable deep learning system (DLS) to help detect active TB and compared its performance to 14 radiologists, from both endemic (India) and non-endemic (US) practice settings. Materials and Methods: We trained a DLS using CXRs from 9 countries spanning Africa, Asia, and Europe. To improve generalization, we incorporated large-scale CXR pretraining, attention pooling, and semi-supervised learning via “noisy student”. The DLS was evaluated on a combined test set spanning sites in China, India, US, and Zambia, with all positives confirmed via microbiology or nucleic acid amplification testing (NAAT). The India test set was independent of those used in training. Another independent test set from a mining population in South Africa was also used to further evaluate the model. Given WHO targets of 90% sensitivity and 70% specificity, the DLS’s operating point was prespecified to favor sensitivity over specificity. Results: Across the combined test set spanning 4 countries, the DLS’s receiver operating characteristic (ROC) curve was above all 9 India-based radiologists (where TB is endemic), with an area under the curve (AUC) of 0.90 (95%CI 0.87-0.92). At the prespecified operating point, the DLS’s sensitivity (88%) was higher than the India-based radiologists (mean sensitivity: 75%, range 69-87%, p<0.001 for superiority), and the DLS’s specificity (79%) was non-inferior to these radiologists (mean specificity: 84%, range 78-88%, p=0.004). Similar trends were observed within HIV positive and sputum smear positive sub-groups, and in the additional South Africa test set. We additionally found that 5 USbased radiologists (where TB is not endemic) who also reviewed the cases were more sensitive but less specific than the India-based radiologists. The DLS was similarly non-inferior to this second cohort of radiologists at the same prespecified operating point. Depending on the simulated setting and prevalence, use of the DLS as a prioritization tool for NAAT could reduce the cost per positive TB case detected by 40-80% compared to the use of NAAT alone. Conclusion: We developed a DLS to detect active pulmonary TB on CXRs, which generalized to patient populations from 5 different regions of the world, and merits prospective evaluation to assist cost-effective screening efforts in settings with scarce access to radiologists. Operating point flexibility may permit customization of the DLS to account for site-specific factors such as TB prevalence, demographics, clinical resources, and customary practice patterns. Introduction Globally, 1 in 4 people are infected with Mycobacterium tuberculosis, and 5-10% of these individuals will develop active tuberculosis (TB) disease in their lifetime1,2. In 2019, the estimated TB mortality was 1.4 million, including 200,000 people who were human immunodeficiency virus (HIV) positive, and an estimated 2.9 million people who contracted TB were not formally reported due to a combination of underreporting, underdiagnosis, and pretreatment loss to follow up. Almost 90% of the active TB cases occur in a few dozen “high-burden” countries, many with scarce resources needed to tackle this public health problem.3 The anticipated rising burden of drug resistant TB poses an increased threat to both endemic and non-endemic parts of the world.4 Lastly, the COVID-19 pandemic that has caused devastation around the world has also disrupted efforts to combat TB: globally, 21% fewer (1.4 million) people received care for TB in 2020 than in 2019.5 In the past decade, there has been steady global support to combat this health crisis through the World Health Organization (WHO)’s End TB Strategy, the United Nations (UN)’s Sustainable Development Goals, and the Global Fund to fight AIDS, TB and malaria.6 Cost effective pulmonary TB screening using CXR has the potential to increase equity in access to healthcare, particularly in difficult-to-reach populations.7 In light of high patient volumes and limited access to timely expert interpretation of CXRs in many regions, there has been active research into using artificial intelligence to screen with a CXR followed by a corroborating diagnostic test;8–20,21 Such artificial intelligence-based triaging followed by GeneXpert testing for a confirmatory diagnosis was shown to be cost-effective compared to GeneXpert alone, and also substantially increased patient throughput.13 As part of their recently-published 2021 guidance, the WHO evaluated three independent computer-aided detection (CAD) software systems, and determined that the diagnostic accuracy and performance of CAD software was similar to human readers.7,9,13,17 Given the scarcity of experienced readers, as an alternative to human interpretation of CXR, the WHO now recommends CAD for both screening and triage in individuals 15 years or older.7 However, the WHO emphasized the importance of using a performant CAD system that has been tested on a population that is representative of the target population. In this study, we developed a deep learning system (DLS) to interpret CXRs for imaging features of active TB. Developing a universal TB classifier can be challenging not only due to the array of potential imaging features but because prevailing imaging features, severity of disease at presentation, and prevalence of TB and HIV can differ broadly on locale. Therefore, we validated our DLS using an aggregate of datasets from China, India, US, and Zambia that together reflect different regions, race/ethnicities, and local disease prevalence. We evaluated the DLS under two conditions: having a single prespecified operating point across all datasets, and when customized to radiologists’ performance in each locale. As diagnostic performance may be influenced by disease prevalence, we compared the DLS with two different cohorts of radiologists: one based in a TB-endemic region (India) and one based in a TB non-endemic region (United States). An analysis of HIV positive and sputum smear positive subgroups was also performed. Finally, we estimate cost savings for using this DLS as a triaging solution for nucleic acid amplification testing (NAAT) in screening settings.

[1]  H. S. Schaaf,et al.  Management of drug-resistant tuberculosis. , 2010, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[2]  Clement J. McDonald,et al.  Lung Segmentation in Chest Radiographs Using Anatomical Atlases With Nonrigid Registration , 2014, IEEE Transactions on Medical Imaging.

[3]  K. Steingart,et al.  Scoring systems using chest radiographic features for the diagnosis of pulmonary tuberculosis in adults: a systematic review , 2012, European Respiratory Journal.

[4]  Po-Hsuan Cameron Chen,et al.  Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19 , 2020, Scientific Reports.

[5]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[6]  E. Mohammadi,et al.  Barriers and facilitators related to the implementation of a physiological track and trigger system: A systematic review of the qualitative evidence , 2017, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[7]  J. Seixas,et al.  Artificial neural network models to support the diagnosis of pleural tuberculosis in adult patients. , 2013, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[8]  Pulmonary TB: varying radiological presentations in individuals with HIV in Soweto, South Africa , 2017, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[9]  Z. Qin,et al.  Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems , 2019, Scientific Reports.

[10]  S. Dorman,et al.  Guidance for Studies Evaluating the Accuracy of Sputum-Based Tests to Diagnose Tuberculosis. , 2019, The Journal of infectious diseases.

[11]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[12]  R. Piccazzo,et al.  Diagnostic Accuracy of Chest Radiography for the Diagnosis of Tuberculosis (TB) and Its Role in the Detection of Latent TB Infection: a Systematic Review , 2014, The Journal of Rheumatology. Supplement.

[13]  T. Frauenfelder,et al.  Detection of tuberculosis patterns in digital photographs of chest X-ray images using Deep Learning: feasibility study. , 2018, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[14]  J. Affeldt,et al.  The feasibility study , 2019, The Information System Consultant’s Handbook.

[15]  M. Muyoyeta,et al.  Active TB case finding in a high burden setting; comparison of community and facility-based strategies in Lusaka, Zambia , 2020, PloS one.

[16]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Dev P. Chakraborty,et al.  Observer Performance Methods for Diagnostic Imaging: Foundations, Modeling, and Applications with R-Based Examples , 2017 .

[18]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[19]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Rick H. H. M. Philipsen,et al.  Computer aided detection of tuberculosis on chest radiographs: An evaluation of the CAD4TB v6 system , 2020, Scientific Reports.

[21]  Eui Jin Hwang,et al.  Development and Validation of a Deep Learning–based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs , 2018, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[22]  Z. Qin,et al.  A new resource on artificial intelligence powered computer automated detection software products for tuberculosis programmes and implementers. , 2021, Tuberculosis.

[23]  Hiroshi Nishiyama,et al.  Points to consider on switching between superiority and non-inferiority. , 2006, British journal of clinical pharmacology.

[24]  L. Roberts How COVID hurt the fight against other dangerous diseases , 2021, Nature.

[25]  Y. Xiong,et al.  Automatic detection of mycobacterium tuberculosis using artificial intelligence. , 2018, Journal of thoracic disease.

[26]  Adam Wunderlich,et al.  Multireader multicase reader studies with binary agreement data: simulation, analysis, validation, and sizing , 2014, Journal of medical imaging.

[27]  B. van Ginneken,et al.  Automated chest-radiography as a triage for Xpert testing in resource-constrained settings: a prospective study of diagnostic accuracy and costs , 2015, Scientific Reports.

[28]  A. Benedetti,et al.  Chest x-ray analysis with deep learning-based software as a triage test for pulmonary tuberculosis: a prospective study of diagnostic accuracy for culture-confirmed disease. , 2020, The Lancet. Digital health.

[29]  V. Kovalev,et al.  The TB Portals: an Open-Access, Web-Based Platform for Global Drug-Resistant-Tuberculosis Data Sharing and Analysis , 2017, Journal of Clinical Microbiology.

[30]  P. Lakhani,et al.  Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. , 2017, Radiology.

[31]  Clement J. McDonald,et al.  Automatic Tuberculosis Screening Using Chest Radiographs , 2014, IEEE Transactions on Medical Imaging.

[32]  George R. Thoma,et al.  A novel stacked generalization of models for improved TB detection in chest radiographs , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[33]  Andrew Y. Ng,et al.  CheXpedition: Investigating Generalization Challenges for Translation of Chest X-Ray Algorithms to the Clinical Setting , 2020, ArXiv.

[34]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  S. Vermund,et al.  A prospective study of the risk of tuberculosis among intravenous drug users with human immunodeficiency virus infection. , 1989, The New England journal of medicine.

[36]  David F. Steiner,et al.  Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation. , 2019, Radiology.

[37]  Andrei Gabrielian,et al.  Performance of Qure.ai automatic classifiers against a large annotated database of patients with diverse forms of tuberculosis , 2020, PloS one.

[38]  Stefan Jaeger,et al.  Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. , 2014, Quantitative imaging in medicine and surgery.

[39]  Berkman Sahiner,et al.  Hypothesis testing in noninferiority and equivalence MRMC ROC studies. , 2012, Academic radiology.