A Natural Language Interface for Dissemination of Reproducible Biomedical Data Science

Computational tools in the form of software packages are burgeoning in the field of medical imaging and biomedical research. These tools enable biomedical researchers to analyze a variety of data using modern machine learning and statistical analysis techniques. While these publicly available software packages are a great step towards a multiplicative increase in the biomedical research productivity, there are still many open issues related to validation and reproducibility of the results. A key gap is that while scientists can validate domain insights that are implicit in the analysis, the analysis itself is coded in a programming language and that domain scientist may not be a programmer. Thus, there is no/limited direct validation of the program that carries out the desired analysis. We propose a novel solution, building upon recent successes in natural language understanding, to address this problem. Our platform allows researchers to perform, share, reproduce and interpret the analysis pipelines and results via natural language. While this approach still requires users to have a conceptual understanding of the techniques, it removes the burden of programming syntax and thus lowers the barriers to advanced and reproducible neuroimaging and biomedical research.

[1]  Nicolai Schoch,et al.  Surgical Data Science: Enabling Next-Generation Surgery , 2017, ArXiv.

[2]  Geoffrey Zweig,et al.  Fast and easy language understanding for dialog systems with Microsoft Language Understanding Intelligent Service (LUIS) , 2015, SIGDIAL Conference.

[3]  Vikas Singh,et al.  Canonical Correlation Analysis on Riemannian Manifolds and Its Applications , 2014, ECCV.

[4]  Jignesh M. Patel,et al.  Ava: From Data to Insights Through Conversations , 2017, CIDR.

[5]  Tobias Kuhn,et al.  A Survey and Classification of Controlled Natural Languages , 2014, CL.

[6]  Vikas Singh,et al.  Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[10]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[11]  Moo K. Chung,et al.  Multivariate General Linear Models (MGLM) on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Russell H. Taylor,et al.  Surgical data science for next-generation interventions , 2017, Nature Biomedical Engineering.

[13]  Yaroslav O. Halchenko,et al.  Open is Not Enough. Let's Take the Next Step: An Integrated, Community-Driven Computing Platform for Neuroscience , 2012, Front. Neuroinform..