Trends and Features of the Applications of Natural Language Processing Techniques for Clinical Trials Text Analysis

Natural language processing (NLP) is an effective tool for generating structured information from unstructured data, the one that is commonly found in clinical trial texts. Such interdisciplinary research has gradually grown into a flourishing research field with accumulated scientific outputs available. In this study, bibliographical data collected from Web of Science, PubMed, and Scopus databases from 2001 to 2018 had been investigated with the use of three prominent methods, including performance analysis, science mapping, and, particularly, an automatic text analysis approach named structural topic modeling. Topical trend visualization and test analysis were further employed to quantify the effects of the year of publication on topic proportions. Topical diverse distributions across prolific countries/regions and institutions were also visualized and compared. In addition, scientific collaborations between countries/regions, institutions, and authors were also explored using social network analysis. The findings obtained were essential for facilitating the development of the NLP-enhanced clinical trial texts processing, boosting scientific and technological NLP-enhanced clinical trial research, and facilitating inter-country/region and inter-institution collaborations.

[1]  Enzo Pascale,et al.  Scientography: Mapping the Tracks of Science , 2003 .

[2]  Saeed Hassanpour,et al.  Artificial Intelligence in Medicine , 2015 .

[3]  Alan Bensoussan,et al.  Traditional Chinese Medicine in Cancer Care: A Review of Controlled Clinical Studies Published in Chinese , 2013, PloS one.

[4]  Donald E. Brown,et al.  Text Classification Algorithms: A Survey , 2019, Inf..

[5]  Niels Peek,et al.  Evaluating the effect of a web-based quality improvement system with feedback and outreach visits on guideline concordance in the field of cardiac rehabilitation: rationale and study protocol , 2014, Implementation Science.

[6]  Peter Haddawy,et al.  A bibliometric study of the world’s research activity in sustainable development and its sub-areas using scientific literature , 2014, Scientometrics.

[7]  Frank van Harmelen,et al.  SemanticCT: A Semantically-Enabled System for Clinical Trials , 2013, KR4HC/ProHealth.

[8]  K. A. Vidhya,et al.  Entity resolution for symptom vs disease for top-K treatments , 2017, 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS).

[9]  Margaret E. Roberts,et al.  A Model of Text for Experimentation in the Social Sciences , 2016 .

[10]  D Demner-Fushman,et al.  Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing , 2016, Yearbook of Medical Informatics.

[11]  Suchi Saria,et al.  Dissecting an Online Intervention for Cancer Survivors , 2015, Health education & behavior : the official publication of the Society for Public Health Education.

[12]  S. Sánchez‐Cañizares,et al.  Past Themes and Tracking Research Trends in Entrepreneurship: A Co-Word, Cites and Usage Count Analysis , 2019, Sustainability.

[13]  José M. Merigó,et al.  A bibliometric analysis of supply chain analytical techniques published in Computers & Industrial Engineering , 2019, Comput. Ind. Eng..

[14]  Frank Puppe,et al.  Ad Hoc Information Extraction for Clinical Data Warehouses , 2018, Methods of Information in Medicine.

[15]  Mohammed El Amine Bechar,et al.  Statistical Comparisons of the Top 10 Algorithms in Data Mining for Classification Task , 2016 .

[16]  Isabel Segura-Bedmar,et al.  Cohort selection for clinical trials using deep learning models , 2019, J. Am. Medical Informatics Assoc..

[17]  Kai Xu,et al.  A Bibliometric Review of Natural Language Processing Empowered Mobile Computing , 2018, Wirel. Commun. Mob. Comput..

[18]  Gordon M. Hickey,et al.  Modelling Research Topic Trends in Community Forestry , 2018, Small-scale Forestry.

[19]  Xiaowei Xu,et al.  Investigating drug repositioning opportunities in FDA drug labels through topic modeling , 2012, BMC Bioinformatics.

[20]  Peng Lin,et al.  A topic modeling based bibliometric exploration of hydropower research , 2016 .

[21]  Chunxia Zhang,et al.  Discovering the Recent Research in Natural Language Processing Field Based on a Statistical Approach , 2017, SETE@ICWL.

[22]  S. Sánchez‐Cañizares,et al.  Past Themes and Tracking Research Trends in Entrepreneurship: A Co-Word, Cites and Usage Count Analysis , 2019, Sustainability.

[23]  J. E. Hirsch,et al.  The meaning of the h-index , 2014 .

[24]  Ruvan Weerasinghe,et al.  Identifying Adverse Drug Reactions by analyzing Twitter messages , 2015, 2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer).

[25]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[26]  Scott L. DuVall,et al.  Unlocking echocardiogram measurements for heart disease research through natural language processing , 2017, BMC Cardiovascular Disorders.

[27]  Donald E. Brown,et al.  HDLTex: Hierarchical Deep Learning for Text Classification , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[28]  Fei Wang,et al.  The top cited clinical research articles on sepsis: a bibliometric analysis , 2012, Critical Care.

[29]  Chris Rensleigh,et al.  Web of Science, Scopus and Google Scholar: A content comprehensiveness comparison , 2013, Electron. Libr..

[30]  L Susan Wieland,et al.  Randomised clinical trials on acupuncture in the Korean literature: bibliometric analysis and methodological quality , 2014, Acupuncture in medicine : journal of the British Medical Acupuncture Society.

[31]  Kai Zheng,et al.  Analyzing Differences between Chinese and English Clinical Text: A Cross-Institution Comparison of Discharge Summaries in Two Languages , 2016, MedInfo.

[32]  Anca I. D. Bucur,et al.  BRIDG-based Trial Metadata Repository - Need for Standardized Machine Interpretable Trial Descriptions , 2014, HEALTHINF.

[33]  Long Chen,et al.  Clinical trial cohort selection based on multi-level rule-based natural language processing system , 2019, J. Am. Medical Informatics Assoc..

[34]  Monique W. M. Jaspers,et al.  What is needed to implement a web-based audit and feedback intervention with outreach visits to improve care quality: A concept mapping study among cardiac rehabilitation teams , 2017, Int. J. Medical Informatics.

[35]  Erik Jones,et al.  Patient-Reported Outcomes in Online Communications on Statins, Memory, and Cognition: Qualitative Analysis Using Online Communities , 2019, Journal of medical Internet research.

[36]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[37]  Roger Bennett,et al.  Willingness of people with mental health disabilities to travel in driverless vehicles , 2019, Journal of Transport & Health.

[38]  Ergin Soysal,et al.  Cohort selection for clinical trials: n2c2 2018 shared task track 1 , 2019, J. Am. Medical Informatics Assoc..

[39]  Pelin Yildirim,et al.  Research trends in the use of augmented reality in science education: Content and bibliometric mapping analysis , 2019, Comput. Educ..

[40]  Antonio Pertusa,et al.  Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks , 2018, Applied Sciences.

[41]  William M. K. Trochim,et al.  Evaluating Research and Impact: A Bibliometric Analysis of Research by the NIH/NIAID HIV/AIDS Clinical Trials Networks , 2011, PloS one.

[42]  James H. Harrison,et al.  Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record , 2018, IEEE Access.

[43]  Gustav Dobos,et al.  Characteristics of randomized controlled trials of yoga: a bibliometric analysis , 2014, BMC Complementary and Alternative Medicine.

[44]  Enrique Herrera-Viedma,et al.  H-classic: a new method to identify classic articles in Implant Dentistry, Periodontics, and Oral Surgery. , 2016, Clinical oral implants research.

[45]  Philip E. Bourne,et al.  A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data , 2019, Int. J. Medical Informatics.

[46]  Fleur Fritz,et al.  Electronic health records to facilitate clinical research , 2016, Clinical Research in Cardiology.

[47]  David G. Rand,et al.  Structural Topic Models for Open‐Ended Survey Responses , 2014, American Journal of Political Science.

[48]  Fabio Rinaldi,et al.  Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review , 2019, JMIR medical informatics.

[49]  Pietro Hiram Guzzi,et al.  Feature Selection Model for Diagnosis, Electronic Medical Records and Geographical Data Correlation , 2016, BCB.

[50]  Ying Xiong,et al.  Cohort selection for clinical trials using hierarchical neural network , 2019, J. Am. Medical Informatics Assoc..

[51]  Xiaoyu Li,et al.  Natural Language Processing for EHR-Based Computational Phenotyping , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  Anca I. D. Bucur,et al.  Supporting Patient Screening to Identify Suitable Clinical Trials , 2014, MIE.

[53]  Enrique Herrera-Viedma,et al.  Some bibliometric procedures for analyzing and evaluating research fields , 2017, Applied Intelligence.

[54]  V G Vinod Vydiswaran,et al.  Describing the patient experience from Yelp reviews of community pharmacies. , 2019, Journal of the American Pharmacists Association : JAPhA.

[55]  Nigel Collier,et al.  Generation of Silver Standard Concept Annotations from Biomedical Texts with Special Relevance to Phenotypes , 2015, PloS one.

[56]  Henk F. Moed,et al.  Combining Mapping and Citation Analysis for Evaluative Bibliometric Purposes: A Bibliometric Study , 1999, J. Am. Soc. Inf. Sci..

[57]  Laura E. Barnes,et al.  Women in ISIS Propaganda: A Natural Language Processing Analysis of Topics and Emotions in a Comparison with Mainstream Religious Group , 2019, SAI.

[58]  Ramakanth Kavuluru,et al.  Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations , 2018, J. Biomed. Informatics.

[59]  Alan L. Porter,et al.  Discovering and forecasting interactions in big data research: A learning-enhanced bibliometric study , 2019, Technological Forecasting and Social Change.

[60]  Tianyong Hao,et al.  A bibliometric analysis of natural language processing in medical research , 2018, BMC Medical Informatics and Decision Making.

[61]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[62]  Tianyong Hao,et al.  Discovering thematic change and evolution of utilizing social media for healthcare research , 2019, BMC Medical Informatics and Decision Making.

[63]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[64]  W. Katherine Tan,et al.  Using Natural Language Processing of Free-Text Radiology Reports to Identify Type 1 Modic Endplate Changes , 2017, Journal of Digital Imaging.

[65]  Siddhartha R. Jonnalagadda,et al.  Text Mining of the Electronic Health Record: An Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials , 2017, Journal of Cardiovascular Translational Research.

[66]  Dinesh Pal Mudaranthakam,et al.  Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records , 2019, Scientific Reports.

[67]  Hyun Ae Jung,et al.  MA19.06 Successful Development of Realtime Automatically Updated Data Warehouse in Health Care (ROOT-S) , 2019, Journal of Thoracic Oncology.

[68]  Goran Nenadic,et al.  Extracting Patient Data from Tables in Clinical Literature - Case Study on Extraction of BMI, Weight and Number of Patients , 2016, HEALTHINF.

[69]  Paulo Novais,et al.  Artificial neural networks in diabetes control , 2015, 2015 Science and Information Conference (SAI).

[70]  M. Giacomini,et al.  A SOA-Based Platform to Support Clinical Data Sharing , 2017, Journal of healthcare engineering.

[71]  Adèle Paul-Hus,et al.  The journal coverage of Web of Science and Scopus: a comparative analysis , 2015, Scientometrics.

[72]  Jiaqi Gong,et al.  HCNN: Heterogeneous Convolutional Neural Networks for Comorbid Risk Prediction with Electronic Health Records , 2017, 2017 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE).

[73]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[74]  Francisco Herrera,et al.  Journal of Informetrics , 2022 .

[75]  Haoran Xie,et al.  Fifty years of British Journal of Educational Technology: A topic modeling based bibliometric perspective , 2020, Br. J. Educ. Technol..

[76]  Gema García-Sáez,et al.  A web-based clinical decision support system for gestational diabetes: Automatic diet prescription and detection of insulin needs , 2017, Int. J. Medical Informatics.

[77]  Oscar Pastor,et al.  Genetic Testing Information Standardization in HL7 CDA and ISO13606 , 2013, MedInfo.

[78]  Enrique Herrera-Viedma,et al.  25 years at Knowledge-Based Systems: A bibliometric analysis , 2015, Knowl. Based Syst..

[79]  Haoran Xie,et al.  Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education , 2020, Comput. Educ..

[80]  Mark Dredze,et al.  Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods , 2018, JMIR medical informatics.

[81]  Lena Mamykina,et al.  Participatory approach to the development of a knowledge base for problem-solving in diabetes self-management , 2016, Int. J. Medical Informatics.

[82]  Lucie Byrne-Davis,et al.  Emotional disclosure in rheumatoid arthritis: Participants’ views on mechanisms , 2006 .

[83]  Enrique Herrera-Viedma,et al.  25years at Knowledge-Based Systems , 2015 .

[84]  Louise Deléger,et al.  A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction , 2012, J. Am. Medical Informatics Assoc..

[85]  Freddy Priyatna,et al.  Querying clinical data in HL7 RIM based relational model with morph-RDB , 2017, Journal of Biomedical Semantics.

[86]  Gwo-Jen Hwang,et al.  Trends and development in technology-enhanced adaptive/personalized learning: A systematic review of journal publications from 2007 to 2017 , 2019, Comput. Educ..

[87]  Xavier Domingo,et al.  Data Mining and Query Answer techniques applied to a bio-nutritional trials focused Expert System , 2012, CCIA.

[88]  Victor Maojo,et al.  A semantic interoperability approach to support integration of gene expression and clinical data in breast cancer , 2017, Comput. Biol. Medicine.

[89]  Haoyi Xiong,et al.  Early detection of diseases using electronic health records data and covariance-regularized linear discriminant analysis , 2017, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[90]  Yan Zhang,et al.  Evidence Base of Clinical Studies on Tai Chi: A Bibliometric Analysis , 2015, PloS one.

[91]  G Savova,et al.  Capturing the Patient’s Perspective: a Review of Advances in Natural Language Processing of Health-Related Text , 2017, Yearbook of Medical Informatics.

[92]  Yugyung Lee,et al.  A semantic framework for intelligent matchmaking for clinical trial eligibility criteria , 2013, TIST.

[93]  Dina Demner-Fushman,et al.  Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers , 2012, Artif. Intell. Medicine.

[94]  Wei Gao,et al.  Scientometric analysis of phosphorus research in eutrophic lakes , 2014, Scientometrics.

[95]  Murtaza Dhuliawala,et al.  What Happens When?: Interpreting Schedule of Activity Tables in Clinical Trial Documents , 2018, BCB.

[96]  L. Penberthy,et al.  Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials , 2013, Experimental biology and medicine.

[97]  P Zweigenbaum,et al.  Clinical Natural Language Processing in 2014: Foundational Methods Supporting Efficient Healthcare , 2015, Yearbook of Medical Informatics.

[98]  Madia Essiet,et al.  Hybrid bag of approaches to characterize selection criteria for cohort identification , 2019, J. Am. Medical Informatics Assoc..

[99]  Zina M. Ibrahim,et al.  Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records , 2017, Scientific Reports.

[100]  Henry G. Small,et al.  Visualizing Science by Citation Mapping , 1999, J. Am. Soc. Inf. Sci..

[101]  John D Lee,et al.  Exploring Trust in Self-Driving Vehicles Through Text Analysis , 2020, Hum. Factors.

[102]  Tianyong Hao,et al.  Exploring two decades of research on classroom dialogue by using bibliometric analysis , 2019, Comput. Educ..

[103]  Yung-Chun Chang,et al.  Medical knowledge infused convolutional neural networks for cohort selection in clinical trials , 2019, J. Am. Medical Informatics Assoc..

[104]  Frank van Harmelen,et al.  Identifying Most Relevant Concepts to Describe Clinical Trial Eligibility Criteria , 2013, HEALTHINF.

[105]  Kenneth L Kehl,et al.  Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports. , 2019, JAMA oncology.

[106]  Richard Lenz Silvia Miksch,et al.  Process Support and Knowledge Representation in Health Care , 2013, Lecture Notes in Computer Science.

[107]  Kui Liu,et al.  A Bibliometric Analysis of PubMed Literature on Middle East Respiratory Syndrome , 2016, International journal of environmental research and public health.

[108]  Tianyong Hao,et al.  Research topics, author profiles, and collaboration networks in the top-ranked journal on educational technology over the past 40 years: a bibliometric analysis , 2019, Journal of Computers in Education.

[109]  Li Wei,et al.  A comparative quantitative study of utilizing artificial intelligence on electronic health records in the USA and China during 2008–2017 , 2018, BMC Medical Informatics and Decision Making.

[110]  B. Bayram,et al.  Bibliometric analysis of top 100 most-cited clinical studies on ultrasound in the Emergency Department. , 2016, The American journal of emergency medicine.

[111]  Qing Yang,et al.  Bibliometric and visualized analysis of China's coal research 2000–2015 , 2018, Journal of Cleaner Production.

[112]  S Velupillai,et al.  Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis , 2015, Yearbook of Medical Informatics.

[113]  Dragomir R. Radev,et al.  A bibliometric and network analysis of the field of computational linguistics , 2016, J. Assoc. Inf. Sci. Technol..

[114]  Iñaki Soto Rey,et al.  The Portal of Medical Data Models: Where Have We Been and Where Are We Going? , 2017, MedInfo.

[115]  Shan Wang,et al.  A bibliometric analysis of event detection in social media , 2019, Online Inf. Rev..

[116]  Mari Carmen Gómez-Cabrera,et al.  A Multicomponent Exercise Intervention that Reverses Frailty and Improves Cognition, Emotion, and Social Networking in the Community-Dwelling Frail Elderly: A Randomized Clinical Trial. , 2016, Journal of the American Medical Directors Association.

[117]  Enrique Herrera-Viedma,et al.  A Bibliometric Analysis of the Intelligent Transportation Systems Research Based on Science Mapping , 2014, IEEE Transactions on Intelligent Transportation Systems.

[118]  Francisco Herrera,et al.  Science mapping software tools: Review, analysis, and cooperative study among tools , 2011, J. Assoc. Inf. Sci. Technol..

[119]  Nawal Sad Houari,et al.  Integrating Agents into a Collaborative Knowledge-based System for Business Rules Consistency Management , 2016, Int. J. Interact. Multim. Artif. Intell..

[120]  Alfred Winter,et al.  Towards Phenotyping of Clinical Trial Eligibility Criteria , 2018, eHealth.

[121]  Siddhartha Jonnalagadda,et al.  Enhancing clinical concept extraction with distributional semantics , 2012, J. Biomed. Informatics.

[122]  Peter Murrell,et al.  Toward Understanding 17th Century English Culture: A Structural Topic Model of Francis Bacon's Ideas , 2018, Journal of Comparative Economics.

[123]  Donghua Zhu,et al.  Semi-automatic Technology Roadmapping Composing Method for Multiple Science, Technology, and Innovation Data Incorporation , 2016 .

[124]  Mimmo Parente,et al.  Biomedical data integration and ontology-driven multi-facets visualization , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[125]  Jun Yan,et al.  A bibliometric analysis of text mining in medical research , 2018, Soft Computing.

[126]  Abhishek Pandey,et al.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review , 2017, J. Biomed. Informatics.

[127]  Henk F. Moed,et al.  Handbook of Quantitative Science and Technology Research: The Use of Publication and Patent Statistics in Studies of S&T Systems , 2004 .

[128]  Chunhua Weng,et al.  DQueST: dynamic questionnaire for search of clinical trials , 2019, J. Am. Medical Informatics Assoc..

[129]  Antonio García-Romero,et al.  Measuring the influence of clinical trials citations on several bibliometric indicators , 2009, Scientometrics.

[130]  Nikolaos Pandis,et al.  “My Invisalign experience”: content, metrics and comment sentiment analysis of the most popular patient testimonials on YouTube , 2018, Progress in orthodontics.

[131]  Justin Starren,et al.  Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review , 2017, Drug Safety.

[132]  Philip E. Bourne,et al.  Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review , 2019, J. Am. Medical Informatics Assoc..

[133]  Gustav Dobos,et al.  Are Indian yoga trials more likely to be positive than those from other countries? A systematic review of randomized controlled trials. , 2015, Contemporary clinical trials.

[134]  Young Kwon Cho,et al.  Characteristics and Quality of Radiologic Randomized Controlled Trials: A Bibliometric Analysis Between 1995 and 2014. , 2016, AJR. American journal of roentgenology.