Understanding the temporal evolution of COVID-19 research through machine learning and natural language processing

The outbreak of the novel coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been continuously affecting human lives and communities around the world in many ways, from cities under lockdown to new social experiences. Although in most cases COVID-19 results in mild illness, it has drawn global attention due to the extremely contagious nature of SARS-CoV-2. Governments and healthcare professionals, along with people and society as a whole, have taken any measures to break the chain of transition and flatten the epidemic curve. In this study, we used multiple data sources, i.e., PubMed and ArXiv, and built several machine learning models to characterize the landscape of current COVID-19 research by identifying the latent topics and analyzing the temporal evolution of the extracted research themes, publications similarity, and sentiments, within the time-frame of January–May 2020. Our findings confirm the types of research available in PubMed and ArXiv differ significantly, with the former exhibiting greater diversity in terms of COVID-19 related issues and the latter focusing more on intelligent systems/tools to predict/diagnose COVID-19. The special attention of the research community to the high-risk groups and people with complications was also confirmed.

[1]  Margaret E. Roberts,et al.  Computer-Assisted Text Analysis for Comparative Politics , 2015, Political Analysis.

[2]  Z. Memish,et al.  The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health — The latest 2019 novel coronavirus outbreak in Wuhan, China , 2020, International Journal of Infectious Diseases.

[3]  Yiwen Cai,et al.  Mental health care for medical staff in China during the COVID-19 outbreak , 2020, The Lancet Psychiatry.

[4]  R. Shaw,et al.  Identifying Research Trends and Gaps in the Context of COVID-19 , 2020, International journal of environmental research and public health.

[5]  D. Gommers,et al.  Incidence of thrombotic complications in critically ill ICU patients with COVID-19 , 2020, Thrombosis Research.

[6]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[7]  W. Applegate,et al.  COVID‐19 Presents High Risk to Older Persons , 2020, Journal of the American Geriatrics Society.

[8]  Kum Fai Yuen,et al.  The Psychological Causes of Panic Buying Following a Health Crisis , 2020, International journal of environmental research and public health.

[9]  Dinggang Shen,et al.  Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19 , 2020, IEEE Reviews in Biomedical Engineering.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Edoardo M. Airoldi,et al.  Summarizing topical content with word frequency and exclusivity , 2012, ICML 2012.

[12]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[13]  A. Wong,et al.  Towards computer-aided severity assessment: training and validation of deep neural networks for geographic extent and opacity extent scoring of chest X-rays for SARS-CoV-2 lung disease severity , 2020, ArXiv.

[14]  Andrea Schiffauerova,et al.  Application of machine learning techniques to assess the trends and alignment of the funded research output , 2020, J. Informetrics.

[15]  Yeen Huang,et al.  Generalized anxiety disorder, depressive symptoms and sleep quality during COVID-19 outbreak in China: a web-based cross-sectional survey , 2020, Psychiatry Research.

[16]  Dinggang Shen,et al.  Severity Assessment of Coronavirus Disease 2019 (COVID-19) Using Quantitative Features from Chest CT Images , 2020, ArXiv.

[17]  Michael Roth,et al.  Are patients with hypertension and diabetes mellitus at increased risk for COVID-19 infection? , 2020, The Lancet Respiratory Medicine.

[18]  Marie-Luce Taupin,et al.  RKHSMetaMod : An R package to estimate the Hoeffding decomposition of an unknown function by solving RKHS Ridge Group Sparse optimization problem , 2019, ArXiv.

[19]  Jianjun Gao,et al.  Breakthrough: Chloroquine phosphate has shown apparent efficacy in treatment of COVID-19 associated pneumonia in clinical studies. , 2020, Bioscience trends.

[20]  Jianjun Gao,et al.  Discovering drugs to treat coronavirus disease 2019 (COVID-19). , 2020, Drug discoveries & therapeutics.

[21]  Yaozong Gao,et al.  Large-scale screening to distinguish between COVID-19 and community-acquired pneumonia using infection size-aware classification , 2021, Physics in medicine and biology.

[22]  Alexander Wong,et al.  COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images , 2020, Scientific reports.

[23]  Margaret E. Roberts,et al.  stm: An R Package for Structural Topic Models , 2019, Journal of Statistical Software.

[24]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.