Challenges and opportunities for public health made possible by advances in natural language processing.

Natural language processing (NLP) is a subfield of artificial intelligence devoted to understanding and generation of language. The recent advances in NLP technologies are enabling rapid analysis of vast amounts of text, thereby creating opportunities for health research and evidence-informed decision making. The analysis and data extraction from scientific literature, technical reports, health records, social media, surveys, registries and other documents can support core public health functions including the enhancement of existing surveillance systems (e.g. through faster identification of diseases and risk factors/at-risk populations), disease prevention strategies (e.g. through more efficient evaluation of the safety and effectiveness of interventions) and health promotion efforts (e.g. by providing the ability to obtain expert-level answers to any health related question). NLP is emerging as an important tool that can assist public health authorities in decreasing the burden of health inequality/inequity in the population. The purpose of this paper is to provide some notable examples of both the potential applications and challenges of NLP use in public health.

[1]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[2]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[3]  Andrés Montoyo,et al.  Advances on natural language processing , 2007, Data Knowl. Eng..

[4]  Registries , 1959, Neurology.

[5]  Michael Marmot,et al.  Fair society, healthy lives: Strategic review of health inequalities in England post-2010 , 2010 .

[6]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[7]  Paola Velardi,et al.  Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language , 2013, PloS one.

[8]  Carol Friedman,et al.  Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine , 2013, J. Biomed. Informatics.

[9]  Bechara Choucair,et al.  Health Department Use of Social Media to Identify Foodborne Illness — Chicago, Illinois, 2013–2014 , 2014, MMWR. Morbidity and mortality weekly report.

[10]  Noémie Elhadad,et al.  Natural Language Processing in Health Care and Biomedicine , 2014 .

[11]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[12]  David Cronkite,et al.  Using natural language processing to identify problem usage of prescription opioids , 2015, Int. J. Medical Informatics.

[13]  M. Shigematsu,et al.  Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review , 2015, PloS one.

[14]  Mariana C. Arcaya,et al.  Inequalities in health: definitions, concepts, and theories , 2015, Global health action.

[15]  A Mawudeku,et al.  Big Data and the Global Public Health Intelligence Network (GPHIN). , 2015, Canada communicable disease report = Releve des maladies transmissibles au Canada.

[16]  B. Jackson,et al.  Advancing health equity to improve health: The time is now†. , 2016, Canada communicable disease report = Releve des maladies transmissibles au Canada.

[17]  Donia Scott,et al.  Extracting information from the text of electronic medical records to improve case detection: a systematic review , 2016, J. Am. Medical Informatics Assoc..

[18]  Rachael Tatman,et al.  Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions , 2017, INTERSPEECH.

[19]  Sophia Ananiadou,et al.  Identifying Personalised Treatments and Clinical Trials for Precision Medicine using Semantic Search with Thalia , 2017, TREC.

[20]  H. Schünemann,et al.  [GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 1: Introduction.] , 2017, Recenti progressi in medicina.

[21]  Abhishek Pandey,et al.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review , 2017, J. Biomed. Informatics.

[22]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[23]  Madhav V. Marathe,et al.  GELL: Automatic Extraction of Epidemiological Line Lists from Open Sources , 2017, KDD.

[24]  Michael V. McConnell,et al.  Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning , 2017, Nature Biomedical Engineering.

[25]  Jessica A. Chen,et al.  Conversational agents in healthcare: a systematic review , 2018, J. Am. Medical Informatics Assoc..

[26]  G. Gartlehner,et al.  [GRADE: Evidence to Decision (EtD) frameworks - a systematic and transparent approach to making well informed healthcare choices. 1: Introduction]. , 2018, Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen.

[27]  Hongfang Liu,et al.  Journal of Biomedical Informatics , 2022 .

[28]  D. Asch,et al.  Facebook language predicts depression in medical records , 2018, Proceedings of the National Academy of Sciences.

[29]  M. Howell,et al.  Ensuring Fairness in Machine Learning to Advance Health Equity , 2018, Annals of Internal Medicine.

[30]  Enrico Coiera,et al.  Automated screening of research studies for systematic reviews using study characteristics , 2018, Systematic Reviews.

[31]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[32]  Miroslav Dudík,et al.  Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? , 2018, CHI.

[33]  Peter Kotanko,et al.  Natural language processing of electronic health records is superior to billing codes to identify symptom burden in hemodialysis patients. , 2019, Kidney international.

[34]  Bin Huang,et al.  Unlocking the potential of population‐based cancer registries , 2019, Cancer.

[35]  Oscar Díaz,et al.  Using Health Chatbots for Behavior Change: A Mapping Study , 2019, Journal of Medical Systems.

[36]  E. Topol,et al.  A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. , 2019, The Lancet. Digital health.

[37]  David F. Steiner,et al.  Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation. , 2019, Radiology.

[38]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[39]  Chris Hankin,et al.  Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating deep learning for health classification , 2019, Inf. Process. Manag..

[40]  C. Zheng,et al.  The use of natural language processing to identify vaccine‐related anaphylaxis at five health care systems in the Vaccine Safety Datalink , 2019, Pharmacoepidemiology and drug safety.

[41]  Eric S. Kim,et al.  Social Media as an Emerging Data Resource for Epidemiologic Research: Characteristics of Social Media Users and Non-users in the Nurses' Health Study II. , 2020, American journal of epidemiology.

[42]  Iain Lake,et al.  Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance , 2019, PloS one.

[43]  Saturnino Luz,et al.  A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis , 2019, Int. J. Medical Informatics.

[44]  Evangelos Kanoulas,et al.  A distantly supervised dataset for automated data extraction from diagnostic studies , 2019, BioNLP@ACL.

[45]  Byron C. Wallace,et al.  Toward systematic review automation: a practical guide to using machine learning tools in research synthesis , 2019, Systematic Reviews.

[46]  Philippe Ravaud,et al.  Automatic screening using word embeddings achieved high sensitivity and workload reduction for updating living network meta-analyses. , 2019, Journal of clinical epidemiology.

[47]  Mike Conway,et al.  Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data , 2019, Yearbook of Medical Informatics.

[48]  Zion Tsz Ho Tse,et al.  Using Twitter for Public Health Surveillance from Monitoring and Prediction to Public Response , 2018, Data.

[49]  Marco V Perez,et al.  Large-Scale Assessment of a Smartwatch to Identify Atrial Fibrillation. , 2019, The New England journal of medicine.

[50]  Kylie L. Anglin Gather-Narrow-Extract: A Framework for Studying Local Policy Variation Using Web-Scraping and Natural Language Processing , 2019, Journal of Research on Educational Effectiveness.

[51]  Binhua Tang,et al.  Recent Advances of Deep Learning in Bioinformatics and Computational Biology , 2019, Front. Genet..

[52]  Cosmin A Bejan,et al.  Identifying Patients with Significant Problems Related to Social Determinants of Health with Natural Language Processing , 2019, MedInfo.

[53]  Bin Li,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[54]  Chunhua Weng,et al.  Advancing Clinical Research Through Natural Language Processing on Electronic Health Records: Traditional Machine Learning Meets Deep Learning , 2019, Health Informatics.

[55]  Joshua Feldman,et al.  Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise , 2019, J. Am. Medical Informatics Assoc..

[56]  Fidel Cacheda,et al.  Early Detection of Depression: Social Network Analysis and Random Forest Techniques , 2019, Journal of medical Internet research.

[57]  Kenneth H. Lai,et al.  Natural Language Processing Combined with ICD-9-CM Codes as a Novel Method to Study the Epidemiology of Allergic Drug Reactions. , 2019, The journal of allergy and clinical immunology. In practice.

[58]  A. Gates,et al.  Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools , 2019, Systematic Reviews.

[59]  E. Ford,et al.  Public Opinions on Using Social Media Content to Identify Users With Depression and Target Mental Health Care Advertising: Mixed Methods Survey , 2019, JMIR mental health.

[60]  Philip E. Bourne,et al.  Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review , 2019, J. Am. Medical Informatics Assoc..

[61]  Jing Liao,et al.  Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error , 2019, Systematic Reviews.

[62]  Fabio Rinaldi,et al.  Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review , 2019, JMIR medical informatics.

[63]  Lina M. Sulieman,et al.  A systematic literature review of machine learning in online personal health data , 2019, J. Am. Medical Informatics Assoc..

[64]  Eric J Topol,et al.  High-performance medicine: the convergence of human and artificial intelligence , 2019, Nature Medicine.

[65]  Goran Nenadic,et al.  Clinical Text Data in Machine Learning: Systematic Review , 2020, JMIR medical informatics.

[66]  Marzyeh Ghassemi,et al.  Treating health disparities with artificial intelligence , 2020, Nature Medicine.

[67]  Michael W. Sjoding,et al.  Diagnosing bias in data-driven algorithms for healthcare , 2020, Nature Medicine.

[68]  Kristina Lerman,et al.  COVID-19: The First Public Coronavirus Twitter Dataset , 2020, ArXiv.

[69]  Alexandra Luccioni,et al.  Mapping the Landscape of Artificial Intelligence Applications against COVID-19 , 2020, J. Artif. Intell. Res..

[70]  Nan Duan,et al.  Progress in Neural NLP: Modeling, Learning, and Reasoning , 2020, Engineering.

[71]  S. Stewart,et al.  Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study , 2020, JMIR formative research.

[72]  Michele Filannino,et al.  2018 N2c2 Shared Task on Adverse Drug Events and Medication Extraction in Electronic Health Records , 2020, J. Am. Medical Informatics Assoc..

[73]  Weiguo Fan,et al.  Adverse drug event detection and extraction from open data: A deep learning approach , 2020, Inf. Process. Manag..

[74]  Auss Abbood,et al.  EventEpi—A natural language processing framework for event-based surveillance , 2019, PLoS Comput. Biol..

[75]  S. Stewart,et al.  Correction: Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study , 2020, JMIR formative research.

[76]  Joshua A. Bittker,et al.  Discovering the anticancer potential of non-oncology drugs by systematic viability profiling , 2020, Nature Cancer.

[77]  Rolf H H Groenwold,et al.  Title, abstract and keyword searching resulted in poor recovery of articles in systematic reviews of epidemiologic practice. , 2020, Journal of clinical epidemiology.

[78]  Hongfang Liu,et al.  Artificial intelligence approaches using natural language processing to advance EHR-based clinical research , 2020 .