Serverless OpenHealth at data commons scale—traversing the 20 million patient records of New York’s SPARCS dataset in real-time

In a previous report, we explored the serverless OpenHealth approach to the Web as a Global Compute space. That approach relies on the modern browser full stack, and, in particular, its configuration for application assembly by code injection. The opportunity, and need, to expand this approach has since increased markedly, reflecting a wider adoption of Open Data policies by Public Health Agencies. Here, we describe how the serverless scaling challenge can be achieved by the isomorphic mapping between the remote data layer API and a local (client-side, in-browser) operator. This solution is validated with an accompanying interactive web application (bit.ly/loadsparcs) capable of real-time traversal of New York’s 20 million patient records of the Statewide Planning and Research Cooperative System (SPARCS), and is compared with alternative approaches. The results obtained strengthen the argument that the FAIR reproducibility needed for Population Science applications in the age of P4 Medicine is particularly well served by the Web platform.

[1]  Jerry Fishenden,et al.  Digitizing government: understanding and implementing new digital business models , 2014 .

[2]  Robert L. Grossman,et al.  A Case for Data Commons: Toward Data Science as a Service , 2016, Computing in Science & Engineering.

[3]  Michael Fitzsimons,et al.  Developing Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API. , 2017, Cancer research.

[4]  Jing Peng,et al.  PopGeV: a web-based large-scale population genome browser , 2015, Bioinform..

[5]  Tricia Walker,et al.  Computer science , 1996, English for academic purposes series.

[6]  David E. Robbins,et al.  ImageJS: Personalized, participated, pervasive, and reproducible image bioinformatics in the web browser , 2012, Journal of pathology informatics.

[7]  Forrest Shull,et al.  The Computational Research and Engineering Acquisition Tools and Environments (CREATE) Program, Part 2 , 2016, Comput. Sci. Eng..

[8]  Helena F. Deus,et al.  Data integration gets 'Sloppy' , 2006, Nature Biotechnology.

[9]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[10]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[11]  Joel H. Saltz,et al.  Safe "cloudification" of large images through picker APIs , 2016, AMIA.

[12]  Robert L Grossman,et al.  Progress Toward Cancer Data Ecosystems. , 2018, Cancer journal.

[13]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[14]  Rinke Hoekstra The knowledge reengineering bottleneck , 2010, Semantic Web.

[15]  David E Frost,et al.  All of us. , 2011, Journal of oral and maxillofacial surgery : official journal of the American Association of Oral and Maxillofacial Surgeons.

[16]  L. Staudt,et al.  The NCI Genomic Data Commons as an engine for precision medicine. , 2017, Blood.

[17]  Joel H. Saltz,et al.  OpenHealth Platform for Interactive Contextualization of Population Health Open Data , 2015, AMIA.

[18]  Ali Kanso,et al.  Serverless: beyond the cloud , 2017, WOSC@Middleware.

[19]  Sean R. Wilkinson,et al.  QMachine: commodity supercomputing in web browsers , 2014, BMC Bioinformatics.