Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud

Big Data analysis has become a key factor of being innovative and competitive. Along with population growth worldwide and the trend aging of population in developed countries, the rate of the national medical care usage has been increasing. Due to the fact that individual medical data are usually scattered in different institutions and their data formats are varied, to integrate those data that continue increasing is challenging. In order to have scalable load capacity for these data platforms, we must build them in good platform architecture. Some issues must be considered in order to use the cloud computing to quickly integrate big medical data into database for easy analyzing, searching, and filtering big data to obtain valuable information.This work builds a cloud storage system with HBase of Hadoop for storing and analyzing big data of medical records and improves the performance of importing data into database. The data of medical records are stored in HBase database platform for big data analysis. This system performs distributed computing on medical records data processing through Hadoop MapReduce programming, and to provide functions, including keyword search, data filtering, and basic statistics for HBase database. This system uses the Put with the single-threaded method and the CompleteBulkload mechanism to import medical data. From the experimental results, we find that when the file size is less than 300MB, the Put with single-threaded method is used and when the file size is larger than 300MB, the CompleteBulkload mechanism is used to improve the performance of data import into database. This system provides a web interface that allows users to search data, filter out meaningful information through the web, and analyze and convert data in suitable forms that will be helpful for medical staff and institutions.

[1]  Zhifeng Xiao,et al.  Remote sensing image database based on NOSQL database , 2011, 2011 19th International Conference on Geoinformatics.

[2]  Jorge Werner,et al.  A Cloud Computing Solution for Patient's Data Collection in Health Care Institutions , 2010, 2010 Second International Conference on eHealth, Telemedicine, and Social Medicine.

[3]  Subrata Acharya,et al.  Bridging Electronic Health Record Access to the Cloud , 2014, 2014 47th Hawaii International Conference on System Sciences.

[4]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[5]  Chao-Tung Yang,et al.  Implementation of Data Transform Method into NoSQL Database for Healthcare Data , 2013, 2013 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[6]  Jin-Soo Kim,et al.  Large-scale incremental processing with MapReduce , 2014, Future Gener. Comput. Syst..

[7]  Patrick Martin,et al.  Towards Cloud-Based Analytics-as-a-Service (CLAaaS) for Big Data Analytics in the Cloud , 2013, 2013 IEEE International Congress on Big Data.

[8]  Paolo Atzeni,et al.  Uniform access to NoSQL systems , 2014, Inf. Syst..

[9]  Zibin Zheng,et al.  Service-Generated Big Data and Big Data-as-a-Service: An Overview , 2013, 2013 IEEE International Congress on Big Data.

[10]  Jun Bai,et al.  Feasibility analysis of big log data real time search based on Hbase and ElasticSearch , 2013, 2013 Ninth International Conference on Natural Computation (ICNC).

[11]  Han Woo Park,et al.  Decomposing social and semantic networks in emerging "big data" research , 2013, J. Informetrics.

[12]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[13]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[14]  Sofia Ouhbi,et al.  Free Web-based Personal Health Records: An Analysis of Functionality , 2013, Journal of Medical Systems.

[15]  Yidong Cui,et al.  Distributed storage of network measurement data on HBase , 2012, 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems.

[16]  Athanasios V. Vasilakos,et al.  An Enhanced Mobile-Healthcare Emergency System Based on Extended Chaotic Maps , 2013, Journal of Medical Systems.

[17]  M. N. Vora,et al.  Hadoop-HBase for large-scale data , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[18]  Ruixuan Li,et al.  Efficient multi-keyword ranked query over encrypted data in cloud computing , 2014, Future Gener. Comput. Syst..

[19]  Tingting Zhang,et al.  Achieving scalability in a distributed electronic health record system , 2013, 2013 Science and Information Conference.

[20]  Yang Jin,et al.  A Distributed Storage Model for EHR Based on HBase , 2011, 2011 International Conference on Information Management, Innovation Management and Industrial Engineering.

[21]  Flora Malamateniou,et al.  A Portal for Ubiquitous Access to Personal Health Records on the Cloud , 2010, MobiHealth.

[22]  Hui Zhao,et al.  An Objective Function for Dividing Class Family in NoSQL Database , 2012, 2012 International Conference on Computer Science and Service System.

[23]  Yang Zheng,et al.  Performance analysis and testing of HBase based on its architecture , 2013, 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).

[24]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[25]  Divyakant Agrawal,et al.  $\mathcal{MD}$-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services , 2012, Distributed and Parallel Databases.

[26]  Yan Ma,et al.  Research of Hadoop-based data flow management system , 2011 .

[27]  Der-Ming Liou,et al.  Design of a Personal Health Record and Health Knowledge Sharing System Using IHE-XDS and OWL , 2013, Journal of Medical Systems.

[28]  Roy D. Sleator,et al.  'Big data', Hadoop and cloud computing in genomics , 2013, J. Biomed. Informatics.

[29]  Danilo Ardagna,et al.  Issues in Handling Complex Data Structures with NoSQL Databases , 2012, 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.