Interactive Healthcare Big Data Analytics Platform under Simulated Performance

To utilize data from hospital systems, big data analytics (BDA) has become increasingly important. BDA enable queries of large highly diverse and real volumes of patient data in an interactively dynamic way that enriches the use of the platform with data visualization for healthcare. We established a Healthcare BDA (HBDA) platform at the University of Victoria (UVic) with Compute Canada/Westgrid, and Vancouver Island Health Authority (VIHA), Victoria, BC, Canada. The framework was a proof-of-concept implementation that tested emulated patient data representative of the main hospital system at VIHA. We cross-referenced all data, its profiles and metadata, with the existing clinical reporting. Our HBDA platform and its performance was tested for different patient query types in simulation with the data ingested into Hadoop file system over different applications of Apache Spark with Zeppelin and Jupyter web-based interfaces, and Apache Drill interfaces. The results showed that the ingestion time of one billion records took circa 2 hours via Apache Spark. Apache Drill outperformed Spark/Zeppelin and Spark/Jupyter. However, it was restricted to running more simplified queries, and very limited in its visualizations exhibiting poor usability for healthcare. Zeppelin running on Spark showed ease-of-use interactions for health applications, but it lacked the flexibility of its interface tools and required extra setup time before running queries. Jupyter on Spark offered high performance stacks not only over our HBDA platform but also in unison to run all queries simultaneously with high usability for a variety of reporting requirements by providers and health professionals.

[1]  M M Hansen,et al.  Big Data in Science and Healthcare: A Review of Recent Literature and Perspectives , 2014, Yearbook of Medical Informatics.

[2]  Wei Hu,et al.  Design and Construction of a Big Data Analytics Framework for Health Applications , 2015, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).

[3]  Peter Saffrey,et al.  Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units , 2012, Science Translational Medicine.

[4]  Ruay-Shiung Chang,et al.  Dynamic Deduplication Decision in a Hadoop Distributed File System , 2014, Int. J. Distributed Sens. Networks.

[5]  S de Lusignan,et al.  Big Data Usage Patterns in the Health Care Domain: A Use Case Driven Approach Applied to the Assessment of Vaccination Benefits and Risks , 2014, Yearbook of Medical Informatics.

[6]  Darcy A. Davis,et al.  Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework , 2013, Journal of General Internal Medicine.

[7]  Tony R. Sahama,et al.  Health big data analytics: current perspectives, challenges and potential solutions , 2014, Int. J. Big Data Intell..

[8]  Prashant Pandey,et al.  Cloud computing , 2010, ICWET.

[9]  Kayvan Najarian,et al.  Big Data Analytics in Healthcare , 2015, BioMed research international.

[10]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[11]  Laurie D. Smith,et al.  A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases , 2015, Genome Medicine.

[12]  Tin Yu Wu,et al.  Towards a framework for large-scale multimedia data storage and processing on Hadoop platform , 2013, The Journal of Supercomputing.

[13]  N Peek,et al.  Technical Challenges for Big Data in Biomedicine and Health: Data Sources, Infrastructure, and Analytics , 2014, Yearbook of Medical Informatics.

[14]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[15]  Sandeep Tata,et al.  BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters , 2013, Bioinform..

[16]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[17]  Neil A. Miller,et al.  Constellation: a tool for rapid, automated phenotype assignment of a highly polymorphic pharmacogene, CYP2D6, from whole-genome sequences , 2016, npj Genomic Medicine.

[18]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[19]  Dursun Delen,et al.  Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud , 2013, Decis. Support Syst..

[20]  Nigam H. Shah,et al.  The coming age of data-driven medicine: translational bioinformatics' next frontier , 2012, J. Am. Medical Informatics Assoc..

[21]  Anurag Barthwal,et al.  Big Data Analytics using Hadoop , 2014 .

[22]  Louis P Garrison,et al.  Universal health coverage--big thinking versus big data. , 2013, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[23]  Mike Chiasson,et al.  The Ends of Information Systems Research: A Pragmatic Framework , 2012, MIS Q..

[24]  William Perrizo,et al.  Big Data Analytics in Bioinformatics and Healthcare , 2014 .