Comparative Study of Query Performance in a Remote Health Framework using Cassandra and Hadoop

With the recent advancements in distributed processing, sensor networks, cloud computing and similar technologies, big data has gained importance and a number of big data applications can now be envisaged which could not be conceptualised earlier. However, gradually as technologists focus on storing, processing and management of big data, a number of big data solutions have come up. The objective of this paper is to study two such solutions, namely Hadoop and Cassandra, in order to find their suitability for healthcare applications. The paper considers a data model for a remote health framework and demonstrates mappings of the data model using Hadoop and Cassandra. The data model follows popular national and international standards for Electronic Health Records. It is shown in the paper that in order to obtain an efficient mapping of a given data model onto a big data solution, like Cassandra, sample queries must be considered. In this paper, health data is stored in Hadoop using xml files considering the same set of queries. Next, the performances of these queries in Hadoop are observed and later, performances of executing these queries on the same experimental setup using Hadoop and Cassandra are compared. YCSB guidelines are followed to design the experiments. The study provides an insight for the applicability of big data solutions in healthcare domain.