Real-Time Discovery Services over Large, Heterogeneous and Complex Healthcare Datasets Using Schema-Less, Column-Oriented Methods

We present a service platform for schema-leess exploration of data and discovery of patient-related statistics from healthcare data sets. The architecture of this platform is motivated by the need for fast, schema-less, and flexible approaches to SQL-based exploration and discovery of information embedded in the common, heterogeneously structured healthcare data sets and supporting components (electronic health records, practice management systems, etc.) The motivating use cases described in the paper are clinical trials candidate discovery, and a treatment effectiveness analysis. Following the use cases, we discuss the key features and software architecture of the platform, the underlying core components (Apache Parquet, Drill, the web services server), and the runtime profiles and performance characteristics of the platform. We conclude by showing dramatic speedup with some approaches, and the performance tradeoffs and limitations of others.

[1]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[2]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[3]  H. Vet,et al.  The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. , 1998, Journal of clinical epidemiology.

[4]  William A. Giovinazzo Object-Oriented Data Warehouse Design: Building A Star Schema , 2000 .

[5]  Elena Losina,et al.  Effectiveness of clinical pathways for total knee and total hip arthroplasty: literature review. , 2003, The Journal of arthroplasty.

[6]  Jim Melton,et al.  SQL:2003 has been published , 2004, SGMD.

[7]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[8]  Amnon Shabo,et al.  Model Formulation: HL7 Clinical Document Architecture, Release 2 , 2006, J. Am. Medical Informatics Assoc..

[9]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[10]  Juha-Miikka Nurmilaakso,et al.  XML and e-business frameworks : A survey , 2007 .

[11]  Juha-Miikka Nurmilaakso,et al.  EDI, XML and e-business frameworks: A survey , 2008, Comput. Ind..

[12]  K Vanhaecht,et al.  Effects of clinical pathways in the joint replacement: a meta-analysis , 2009, BMC medicine.

[13]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[14]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[16]  Matteo Golfarelli,et al.  Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD , 2011, DaWaK.

[17]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[18]  Edmon Begoli,et al.  Design Principles for Effective Knowledge Discovery from Big Data , 2012, 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture.

[19]  Ara Darzi,et al.  Preparing for precision medicine. , 2012, The New England journal of medicine.

[20]  Michael Hausenblas,et al.  Apache Drill: Interactive Ad-Hoc Analysis at Scale , 2013, Big Data.

[21]  Kamran Sartipi,et al.  HL7 FHIR: An Agile and RESTful approach to healthcare information exchange , 2013, Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems.

[22]  Tim Bray,et al.  Internet Engineering Task Force (ietf) the Javascript Object Notation (json) Data Interchange Format , 2022 .

[23]  Edmon Begoli,et al.  Integrating apache spark into PBS-Based HPC environments , 2015, XSEDE.

[24]  Joshua T. Cohen,et al.  Medicare is scrutinizing evidence more tightly for national coverage determinations. , 2015, Health affairs.

[25]  Julia Muennich Cowell,et al.  Population Health , 2016, The Journal of school nursing : the official publication of the National Association of School Nurses.