ScaDiPaSi: An Effective Scalable and Distributable MapReduce-Based Method to Find Patient Similarity on Huge Healthcare Networks

Abstract Healthcare network information growth follows an exponential pattern, and current database management systems cannot adequately manage this huge amount of data. It is necessary to use a “big data” solution for healthcare problems. One of the most important problems in healthcare is finding Patient Similarity (PaSi). Current methods for finding PaSi are not adaptive and do not support all data sources, nor can they fulfill user requirements for a query tool. In this paper, we propose a scalable and distributable method to solve PaSi problems over MapReduce architecture. ScaDiPaSi, supports storage and retrieval of all kinds of data sources in a timely manner. The dynamic nature of the proposed method helps users to define conditions on all entered fields. Our evaluation shows that we can use this method with high confidence and low execution time.

[1]  Gunther Heidemann,et al.  Determining Patient Similarity in Medical Social Networks , 2010 .

[2]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[3]  Stéphane Marchand-Maillet,et al.  MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy , 2013, Parallel Comput..

[4]  Sven Groot Modeling I/O Interference in Data Intensive Map-Reduce Applications , 2012, 2012 IEEE/IPSJ 12th International Symposium on Applications and the Internet.

[5]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[6]  Ryen W. White,et al.  Report on the SIGIR 2013 workshop on health search and discovery , 2013, SIGIR Forum.

[7]  W. Liu,et al.  Big Data as an e-Health Service , 2014, 2014 International Conference on Computing, Networking and Communications (ICNC).

[8]  David Beymer,et al.  Large-scale multimodal mining for healthcare with mapreduce , 2010, IHI.

[9]  Vassilios S. Verykios,et al.  A distributed near-optimal LSH-based framework for privacy-preserving record linkage , 2014, Comput. Sci. Inf. Syst..

[10]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[11]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[12]  Uma Srinivasan,et al.  Leveraging Big Data Analytics to Reduce Healthcare Costs , 2013, IT Professional.

[13]  Caitlin Lustig,et al.  PatientsLikeMe : Empowerment and Representation in a Patient-Centered Social Network , 2009 .

[14]  David Carmel,et al.  Towards expressive exploratory search over entity-relationship data , 2012, WWW.

[15]  Gabriel Antoniu,et al.  BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[16]  Fred Highland,et al.  Fitting the Problem to the Paradigm: Algorithm Characteristics Required for Effective Use of MapReduce , 2012, Complex Adaptive Systems.

[17]  Yannis E. Ioannidis,et al.  AITION: A scalable KDD platform for Big Data Healthcare , 2014, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[18]  Jin-Soo Kim,et al.  Large-scale incremental processing with MapReduce , 2014, Future Gener. Comput. Syst..

[19]  LWC Chan,et al.  Machine learning of patient similarity: A case study on predicting survival in cancer patient after locoregional chemotherapy , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[20]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[21]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, IPDPS Workshops.

[22]  Jimeng Sun,et al.  PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records , 2014, J. Biomed. Informatics.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Haggai Roitman,et al.  A Unified Approach for Social-Medical Discovery , 2011, MIE.

[25]  Ayman Elnaggar,et al.  Towards Real-Time Analytics in the Cloud , 2013, 2013 IEEE Ninth World Congress on Services.

[26]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[27]  Gang-hoon Kim,et al.  Potentiality of Big Data in the Medical Sector: Focus on How to Reshape the Healthcare System , 2013, Healthcare informatics research.

[28]  Dae Won Kim,et al.  Exploratory search over social-medical data , 2011, CIKM '11.

[29]  Weizhong Zhao,et al.  h-MapReduce: A Framework for Workload Balancing in MapReduce , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[30]  Jimeng Sun,et al.  Integrating Distance Metrics Learned from Multiple Experts and its Application in Inter-Patient Similarity Assessment , 2011, SDM.

[31]  Darcy A. Davis,et al.  Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework , 2013, Journal of General Internal Medicine.

[32]  Modern age until Health Insurance Portability and Accountability Act , 2011 .

[33]  Shan Huang,et al.  ComMapReduce: An Improvement of MapReduce with Lightweight Communication Mechanisms , 2012, DASFAA.