Classification and disease probability prediction via machine learning programming based on multi-GPU cluster MapReduce system

This paper described the nascent filed of big health data classification and disease probability prediction based on multi-GPU cluster MapReduce platform. Firstly, we presented a novel optimization-based multi-GPU cluster MapReduce system (gcMR) which is general purpose and suitable for processing big health data. Secondly, we proposed a new method IVP-SVM to solve the problem of big health data classification and disease probabilistic predictive inaccuracy. To illustrate the power and flexibility of gcMR platform for big health data, applications of a broad class of health big data using IVP-SVM on gcMR platform are described. Experimental results shown that gcMR platform yields an average computing efficiency on different health applications ranging from 1.8- to 13.5-folds by comparing gcMR with other Multi-GPU MapReduce platform. And an accuracy of the proposed IVP-SVM on different health applications is ranging from 85 to 100 %. This provides a motivation for pursuing the use of gcMR and IVP-SVM as a big health data analytical platform and tool, respectively.

[1]  Scott B. Baden,et al.  Effective multi-GPU communication using multiple CUDA streams and threads , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[2]  Hamid R. Arabnia,et al.  Combined Integer and Floating Point Multiplication Architecture(CIFM) for FPGAs and Its Reversible Logic Implementation , 2006, 2006 49th IEEE International Midwest Symposium on Circuits and Systems.

[3]  Hamid R. Arabnia,et al.  Efficient Reversible Logic Design of BCD Subtractors , 2009, Trans. Comput. Sci..

[4]  Ilia Nouretdinov,et al.  Defensive Forecast for Conformal Bounded Regression , 2013, AIAI.

[5]  Tong Liu,et al.  The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications , 2011, Computer Science - Research and Development.

[6]  Jie Tang,et al.  A MapReduce Computing Framework Based on GPU Cluster , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[7]  H. V. Jayashree,et al.  Progress in Reversible Processor Design: A Novel Methodology for Reversible Carry Look-Ahead Adder , 2013, Trans. Comput. Sci..

[8]  Hai Jiang,et al.  Scaling up MapReduce-based Big Data Processing on Multi-GPU systems , 2014, Cluster Computing.

[9]  Sayantan Sur,et al.  MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.

[10]  Taghi M. Khoshgoftaar,et al.  A review of data mining using big data in health informatics , 2013, Journal Of Big Data.

[11]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[12]  Alexander Gammerman,et al.  Applying Conformal Prediction to the Bovine TB Diagnosing , 2011, EANN/AIAI.

[13]  Rafael Asenjo,et al.  Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures , 2014, The Journal of Supercomputing.

[14]  Christian Darabos,et al.  The multiscale backbone of the human phenotype network based on biological pathways , 2014, BioData Mining.

[15]  Harris Papadopoulos,et al.  Inductive Venn Prediction , 2015, Annals of Mathematics and Artificial Intelligence.

[16]  Mengjun Xie,et al.  Moim: A Multi-GPU MapReduce Framework , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[17]  Avi Mendelson,et al.  Batch Method for Efficient Resource Sharing in Real-Time Multi-GPU Systems , 2014, ICDCN.

[18]  Rajiv Misra,et al.  Epidemic disease propagation detection algorithm using MapReduce for realistic social contact networks , 2014, 2014 International Conference on High Performance Computing and Applications (ICHPCA).

[19]  Weiguo Liu,et al.  GCMR: A GPU Cluster-Based MapReduce Framework for Large-Scale Data Processing , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[20]  Harleen Kaur,et al.  A Spectrum of Big Data Applications for Data Analytics , 2015 .

[21]  Amaury Lendasse,et al.  Extreme Learning Machines for Multiclass Classification: Refining Predictions with Gaussian Mixture Models , 2015, IWANN.

[22]  Harris Papadopoulos,et al.  Calibrated probabilistic predictions for biomedical applications , 2012, 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE).

[23]  P. Balasubramanian RB_DSOP: A Rule Based Disjoint Sum of Products Synthesis Method , 2012 .

[24]  Harris Papadopoulos,et al.  Osteoporosis Risk Assessment with Well-Calibrated Probabilistic Outputs , 2013, AIAI.

[25]  Sergio Herrero-Lopez,et al.  Accelerating SVMs by integrating GPUs into MapReduce clusters , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[26]  P. Balasubramanian,et al.  Robust Asynchronous Carry Lookahead Adders , 2011 .

[27]  Seokhee Jeon,et al.  MapReduce based parallel gene selection method , 2014, Applied Intelligence.

[28]  Hamid R. Arabnia,et al.  A Reversible Programmable Logic Array (RPLA) Using Fredkin and Feynman Gates for Industrial Electronics and Applications , 2006, CDES.

[29]  Vladimir Vovk,et al.  Conformal predictors in early diagnostics of ovarian and breast cancers , 2012, Progress in Artificial Intelligence.

[30]  Alexander Gammerman,et al.  SVM Venn Machine with k-Means Clustering , 2014, AIAI Workshops.

[31]  Sebastián Dormido,et al.  Computationally Efficient Five-Class Image Classifier Based on Venn Predictors , 2015, SLDS.

[32]  Ian Foster,et al.  Computer Architectures for Health Care and Biomedicine , 2014 .

[33]  Alexander Gammerman,et al.  Multiprobabilistic prediction in early medical diagnoses , 2013, Annals of Mathematics and Artificial Intelligence.

[34]  Emad A. Mohammed,et al.  Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends , 2014, BioData Mining.

[35]  Hari Om,et al.  Significant Patterns Extraction to Find Most Effective Treatment for Oral Cancer Using Data Mining , 2015 .

[36]  Archana L. Rane,et al.  Clinical decision support model for prevailing diseases to improve human life survivability , 2015, 2015 International Conference on Pervasive Computing (ICPC).

[37]  Mark Menor,et al.  Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery , 2013, BMC Genomics.

[38]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[39]  Hao Li,et al.  Performance modeling in CUDA streams — A means for high-throughput data processing , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[40]  Eva K. Lee,et al.  Classification and Disease Prediction Via Mathematical Programming , 2007 .