A bibliometric analysis of natural language processing in medical research

BackgroundNatural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field.MethodsWe conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007–2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method.ResultsThere were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country’s publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc.ConclusionsA bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.

[1]  Brendan J. Frey,et al.  Response to Comment on "Clustering by Passing Messages Between Data Points" , 2008, Science.

[2]  Efstathios Stamatatos,et al.  Syntactic N-grams as machine learning features for natural language processing , 2014, Expert Syst. Appl..

[3]  Vincent Larivière,et al.  A Small World of Citations? The Influence of Collaboration Networks on Citation Practices , 2011, PloS one.

[4]  L. Baur,et al.  Longitudinal trends in global obesity research and collaboration: a review using bibliometric metadata , 2016, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[5]  Ronald Rousseau,et al.  Social network analysis: a powerful strategy, also for the information sciences , 2002, J. Inf. Sci..

[6]  P. Hinds,et al.  Automated Outcome Classification of Computed Tomography Imaging Reports for Pediatric Traumatic Brain Injury. , 2016, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[7]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[8]  Akhtar Hussain,et al.  Bibliometric analysis of the 'Electronic Library' journal (2000-2010) , 2011, Webology.

[9]  Antonio Gabriel López-Herrera,et al.  An application of co-word analysis and bibliometric maps for detecting the most highlighting themes in the consumer behaviour research from a longitudinal perspective , 2012 .

[10]  Stephen A. McGuire,et al.  Introductory Statistics , 2007, Technometrics.

[11]  Yizhao Ni,et al.  An end-to-end hybrid algorithm for automated medication discrepancy detection , 2015, BMC Medical Informatics and Decision Making.

[12]  Devore S. Culver,et al.  Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing–Based Algorithm With Statewide Electronic Medical Records , 2016, JMIR medical informatics.

[13]  T Hao,et al.  Adaptive Semantic Tag Mining from Heterogeneous Clinical Research Texts , 2014, Methods of Information in Medicine.

[14]  Enrique Herrera-Viedma,et al.  25years at Knowledge-Based Systems , 2015 .

[15]  E. Tabak,et al.  Dynamical Phenotyping: Using Temporal Analysis of Clinically Collected Physiologic Data to Stratify Populations , 2014, PloS one.

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  E. Herrera‐Viedma,et al.  Analyzing the Scientific Evolution of Social Work Using Science Mapping , 2015 .

[18]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[19]  Haihong Zhu,et al.  A bibliometric and visual analysis of global geo-ontology research , 2017, Comput. Geosci..

[20]  Olivier Serrat Social Network Analysis , 2009 .

[21]  Tianyong Hao,et al.  Clustering clinical trials with similar eligibility criteria features , 2014, J. Biomed. Informatics.

[22]  P Zweigenbaum,et al.  Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest , 2016, Yearbook of Medical Informatics.

[23]  Anna Rumshisky,et al.  Research and applications: Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods , 2014, J. Am. Medical Informatics Assoc..

[24]  Bing Ma,et al.  Chinese academic contribution to burns: A comprehensive bibliometrics analysis from 1985 to 2014. , 2016, Burns : journal of the International Society for Burn Injuries.

[25]  Min Song,et al.  Analyzing the field of bioinformatics with the multi-faceted topic modeling technique , 2017, BMC Bioinformatics.

[26]  Steven Bethard,et al.  Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning , 2016, J. Am. Medical Informatics Assoc..

[27]  Hongfang Liu,et al.  Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text , 2016, Methods of Information in Medicine.

[28]  C. Feldens,et al.  Traumatic Dental Injuries in the primary dentition: a 15-year bibliometric analysis of Dental Traumatology. , 2016, Dental traumatology : official publication of International Association for Dental Traumatology.

[29]  Yaoyun Zhang,et al.  Domain Adaptation for Semantic Role Labeling of Clinical Text , 2015, AMIA.

[30]  Enrique Herrera-Viedma,et al.  A Bibliometric Analysis of the Intelligent Transportation Systems Research Based on Science Mapping , 2014, IEEE Transactions on Intelligent Transportation Systems.

[31]  Karen E Hoffman,et al.  Does Cancer Literature Reflect Multidisciplinary Practice? A Systematic Review of Oncology Studies in the Medical Literature Over a 20-Year Period. , 2015, International journal of radiation oncology, biology, physics.

[32]  David Sánchez,et al.  An ontology-based measure to compute semantic similarity in biomedicine , 2011, J. Biomed. Informatics.

[33]  Yue Wang,et al.  Clinical Word Sense Disambiguation with Interactive Search and Classification , 2016, AMIA.

[34]  Ahmad Fouad El-Samak,et al.  Optimization of Traveling Salesman Problem Using Affinity Propagation Clustering and Genetic Algorithm , 2015, J. Artif. Intell. Soft Comput. Res..

[35]  Dragomir R. Radev,et al.  A bibliometric and network analysis of the field of computational linguistics , 2016, J. Assoc. Inf. Sci. Technol..

[36]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[37]  Chen Lin,et al.  Multilayered temporal modeling for the clinical domain , 2016, J. Am. Medical Informatics Assoc..

[38]  Anna Rumshisky,et al.  Normalization of Relative and Incomplete Temporal Expressions in Clinical Narratives , 2015, J. Am. Medical Informatics Assoc..

[39]  Christophe Boudry,et al.  Eye Neoplasms Research: A Bibliometric Analysis from 1966 to 2012 , 2015, European journal of ophthalmology.

[40]  Ulrich Bodenhofer,et al.  APCluster: an R package for affinity propagation clustering , 2011, Bioinform..

[41]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[42]  Yahya Ibrahim Harande,et al.  Basic Literature of Diabetes: A Bibliometrics Analysis of Three Countries in Different World Regions , 2014 .

[43]  Peng Lin,et al.  A topic modeling based bibliometric exploration of hydropower research , 2016 .

[44]  Chunxia Zhang,et al.  Discovering the Recent Research in Natural Language Processing Field Based on a Statistical Approach , 2017, SETE@ICWL.

[45]  Andy Wai Kan Yeung,et al.  The Changing Landscape of Neuroscience Research, 2006–2015: A Bibliometric Study , 2017, Front. Neurosci..

[46]  Velmurugan Chandran Research Trends in Journal of Intellectual Property Rights (JIPR): A Bibliometric Study , 2013 .

[47]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[48]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Peter J. Haug,et al.  Bmc Medical Informatics and Decision Making Automation of a Problem List Using Natural Language Processing , 2005 .

[50]  Tianyong Hao,et al.  Discovering Commonly Shared Semantic Concepts of Eligibility Criteria for Learning Clinical Trial Design , 2015, ICWL.

[51]  Ivan J. Perry,et al.  Type 2 Diabetes Research Yield, 1951-2012: Bibliometrics Analysis and Density-Equalizing Mapping , 2015, PloS one.

[52]  D Demner-Fushman,et al.  Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing , 2016, Yearbook of Medical Informatics.

[53]  Tahereh Dehdarirad,et al.  Research trends in gender differences in higher education and science: a co-word analysis , 2014, Scientometrics.

[54]  Paul Klimo,et al.  A correlation between National Institutes of Health funding and bibliometrics in neurosurgery. , 2014, World neurosurgery.

[55]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[56]  Stefan C. Wolter,et al.  The Use of Bibliometrics to Measure Research Performance in Education Sciences , 2012, Research in Higher Education.

[57]  Shrikanth S. Narayanan,et al.  "Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing , 2015, PloS one.

[58]  Tianyong Hao,et al.  A Data-Driven Approach for Discovering the Recent Research Status of Diabetes in China , 2017, HIS.

[59]  Jun Yan,et al.  Large‐scale extraction of drug–disease pairs from the medical literature , 2017, J. Assoc. Inf. Sci. Technol..

[60]  Alan M. MacEachren,et al.  Geographic visualization: designing manipulable maps for exploring temporally varying georeferenced statistics , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[61]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[62]  F. Gutiérrez,et al.  A bibliometric analysis of tuberculosis research indexed in PubMed, 1997-2006. , 2008, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[63]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Tomer Hertz,et al.  Pairwise Clustering and Graphical Models , 2003, NIPS.

[65]  Francisco Herrera,et al.  Journal of Informetrics , 2022 .

[66]  Christophe Boudry,et al.  Trends and topics in eye disease research in PubMed from 2010 to 2014 , 2016, PeerJ.

[67]  Enrique Herrera-Viedma,et al.  25 years at Knowledge-Based Systems: A bibliometric analysis , 2015, Knowl. Based Syst..