A Systematic Review of Open Source Clinical Software on GitHub for Improving Software Reuse in Smart Healthcare

The plethora of open source clinical software offers great reuse opportunities for developers to build clinical tools at lower cost and at a faster pace. However, the lack of research on open source clinical software poses a challenge for software reuse in clinical software development. This paper aims to help clinical developers better understand open source clinical software by conducting a thorough investigation of open source clinical software hosted on GitHub. We first developed a data pipeline that automatically collected and preprocessed GitHub data. Then, a deep analysis with several methods, such as statistical analysis, hypothesis testing, and topic modeling, was conducted to reveal the overall status and various characteristics of open source clinical software. There were 14,971 clinical-related GitHub repositories created during the last 10 years, with an average annual growth rate of 55%. Among them, 12,919 are open source clinical software. Our analysis unveiled a number of interesting findings: Popular open source clinical software in terms of the number of stars, most productive countries that contribute to the community, important factors that make an open source clinical software popular, and 10 main groups of open source clinical software. The results can assist both researchers and practitioners, especially newcomers, in understanding open source clinical software.

[1]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[2]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[3]  Ioannis Stamelos,et al.  An Empirical Study on the Reuse of Third-Party Libraries in Open-Source Software Development , 2015, BCI.

[4]  Derek Shanahan,et al.  Geographic Visualization: Concepts, Tools and Applications , 2009 .

[5]  Shreyas Ananthan,et al.  A large-scale analysis of bioinformatics code on GitHub , 2018, bioRxiv.

[6]  Henrica C W de Vet,et al.  Nutrition screening tools: does one size fit all? A systematic review of screening tools for the hospital setting. , 2014, Clinical nutrition.

[7]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[8]  Miltiadis D. Lytras,et al.  Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients , 2018, Telematics Informatics.

[9]  Wei Wang,et al.  Aztec: A Platform to Render Biomedical Software Findable, Accessible, Interoperable, and Reusable , 2017, ArXiv.

[10]  Sjaak Brinkkemper,et al.  HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets. , 2016, Assay and drug development technologies.

[11]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[12]  Dean F Sittig,et al.  Challenges in patient safety improvement research in the era of electronic health records. , 2016, Healthcare.

[13]  Marcelo Schots,et al.  On the use of visualization for supporting software reuse , 2014, ICSE Companion.

[14]  Gary Anthes Open source software no longer optional , 2016, Commun. ACM.

[15]  David H. Laidlaw,et al.  The Visualization Handbook. Painting and Visualization , 2005 .

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Marcus A. Badgeley,et al.  EHDViz: clinical dashboard development using open-source technologies , 2016, BMJ Open.

[18]  Roger C M Ho,et al.  Personalized reminiscence therapy M-health application for patients living with dementia: Innovating using open source code repository. , 2017, Technology and health care : official journal of the European Society for Engineering and Medicine.

[19]  D. Dent,et al.  Tools for Assessment of Communication Skills of Hospital Action Teams: A Systematic Review. , 2017, Journal of surgical education.

[20]  Veit Jahns,et al.  Information visualization: perception for design by Colin Ware , 2014, SOEN.

[21]  Anne Spinewine,et al.  Electronic tools to support medication reconciliation: a systematic review , 2017, J. Am. Medical Informatics Assoc..

[22]  Simon N. Wood,et al.  Generalized Additive Models , 2006, Annual Review of Statistics and Its Application.

[23]  John C. Mayan,et al.  Health Informatics in Developing Countries: Going beyond Pilot Practices to Sustainable Implementations: A Review of the Current Challenges , 2014, Healthcare informatics research.

[24]  Xia Feng,et al.  Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey , 2017, Multimedia Tools and Applications.

[25]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[26]  Christopher J. Fox,et al.  Quality Improvement Using A Software Reuse Failure Modes Model , 1996, IEEE Trans. Software Eng..

[27]  Stephen A. McGuire,et al.  Introductory Statistics , 2007, Technometrics.