Health Big Data Analytics: A Technology Survey

Because of the vast availability of data, there has been an additional focus on the health industry and an increasing number of studies that aim to leverage the data to improve healthcare have been conducted. The health data are growing increasingly large, more complex, and its sources have increased tremendously to include computerized physician order entry, electronic medical records, clinical notes, medical images, cyber-physical systems, medical Internet of Things, genomic data, and clinical decision support systems. New types of data from sources like social network services and genomic data are used to build personalized healthcare systems, hence health data are obtained in various forms, from varied sources, contexts, technologies, and their nature can impede a proper analysis. Any analytical research must overcome these obstacles to mine data and produce meaningful insights to save lives. In this paper, we investigate the key challenges, data sources, techniques, technologies, as well as future directions in the field of big data analytics in healthcare. We provide a do-it-yourself review that delivers a holistic, simplified, and easily understandable view of various technologies that are used to develop an integrated health analytic application.

[1]  G. Comi,et al.  Comparison of MRI criteria at first presentation to predict conversion to clinically definite multiple sclerosis. , 1997, Brain : a journal of neurology.

[2]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[3]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[4]  M. Balazinska,et al.  A Study of Skew in MapReduce Applications , 2011 .

[5]  Eduardo B. Fernández,et al.  An analysis of modeling flaws in HL7 and JAHIS , 2005, SAC '05.

[6]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[7]  Konstantinos Kamnitsas,et al.  Efficient multi‐scale 3D CNN with fully connected CRF for accurate brain lesion segmentation , 2016, Medical Image Anal..

[8]  H.P. Ng,et al.  Medical Image Segmentation Using K-Means Clustering and Improved Watershed Algorithm , 2006, 2006 IEEE Southwest Symposium on Image Analysis and Interpretation.

[9]  Matt Welsh,et al.  Sensor networks for medical care , 2005, SenSys '05.

[10]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[11]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Bernd Blobel,et al.  Enhanced Semantic Interpretability by HealthCare Standards Profiling , 2008, MIE.

[13]  J R Beck,et al.  Markov Models in Medical Decision Making , 1993, Medical decision making : an international journal of the Society for Medical Decision Making.

[14]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[15]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[16]  Dongkyoo Shin,et al.  Effective Diagnosis of Heart Disease through Bagging Approach , 2009, 2009 2nd International Conference on Biomedical Engineering and Informatics.

[17]  Andrew J. Schaefer,et al.  Modeling Medical Treatment Using Markov Decision Processes , 2005 .

[18]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[19]  Alexey N. Yakovlev,et al.  Distributed data-driven platform for urgent decision making in cardiological ambulance control , 2018, Future Gener. Comput. Syst..

[20]  Takashi Kawashima,et al.  Mapping brain activity at scale with cluster computing , 2014, Nature Methods.

[21]  Junyi Xia,et al.  High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy. , 2008, Medical physics.

[22]  R. Chang,et al.  Data mining with decision trees for diagnosis of breast tumor in medical ultrasonic images , 2001, Breast Cancer Research and Treatment.

[23]  Norman D. Black,et al.  Feature selection and classification model construction on type 2 diabetic patients' data , 2007, Artif. Intell. Medicine.

[24]  Mazin Abed Mohammed,et al.  Artificial neural networks for automatic segmentation and identification of nasopharyngeal carcinoma , 2017, J. Comput. Sci..

[25]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[26]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[27]  Tony R. Sahama,et al.  Health big data analytics: current perspectives, challenges and potential solutions , 2014, Int. J. Big Data Intell..

[28]  Haijun Yang,et al.  Design and Implementation of Electronic Medical Record Template Based on XML Schema , 2010, 2010 Second World Congress on Software Engineering.

[29]  Xi Jin,et al.  A cloud computing solution for Hospital Information System , 2010, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[30]  Elizabeth M. Borycki,et al.  A Comparison of National Health Data Interoperability Approaches in Taiwan, Denmark and Canada , 2011 .

[31]  Jason Lawrence,et al.  HIPI : A Hadoop Image Processing Interface for Image-based MapReduce Tasks , 2011 .

[32]  Cheryl Ann Alexander,et al.  Big Data in Medical Applications and Health Care , 2015 .

[33]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[34]  Stefan Oniga,et al.  Activity and health status monitoring system , 2017, 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE).

[35]  Xinlei Wang,et al.  Application of cloud computing in the health information system , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[36]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[37]  Jacek M. Zurada,et al.  An approach to multimodal biomedical image registration utilizing particle swarm optimization , 2004, IEEE Transactions on Evolutionary Computation.

[38]  WestonJason,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002 .

[39]  Jiann-Liang Chen,et al.  Design of middleware for tele-homecare systems , 2009, Wirel. Commun. Mob. Comput..

[40]  M. Hoffman,et al.  The Clinical Bioinformatics Ontology: A Curated Semantic Network Utilizing RefSeq Information , 2004, Pacific Symposium on Biocomputing.

[41]  Sören-Oliver Deininger,et al.  MALDI imaging combined with hierarchical clustering as a new tool for the interpretation of complex human cancers. , 2008, Journal of proteome research.

[42]  John A. Stankovic,et al.  Context-aware wireless sensor networks for assisted living and residential monitoring , 2008, IEEE Network.

[43]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[44]  Werner Ceusters,et al.  HL7 RIM: An Incoherent Standard , 2006, MIE.

[45]  Christopher J. Seebregts,et al.  Architectural frameworks for developing national health information systems in low and middle income countries , 2013, Proceedings of the First International Conference on Enterprise Systems: ES 2013.

[46]  Yongsheng Ding,et al.  An ensemble classifier based prediction of G-protein-coupled receptor classes in low homology , 2015, Neurocomputing.

[47]  Jianfeng Tang,et al.  The NoSQL Principles and Basic Application of Cassandra Model , 2012, 2012 International Conference on Computer Science and Service System.

[48]  Yu Tian,et al.  Design and Development of a Medical Big Data Processing System Based on Hadoop , 2015, Journal of Medical Systems.

[49]  Elena Vlahu-Gjorgievska,et al.  Towards Collaborative Health Care System Model - COHESY , 2011, 2011 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks.

[50]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[51]  John Doucette,et al.  Adopting electronic medical records in primary care: Lessons learned from health information systems implementation experience in seven countries , 2009, Int. J. Medical Informatics.

[52]  Charles Mbohwa,et al.  Home Healthcare Staff Scheduling: A Clustering Particle Swarm Optimization Approach , 2014 .

[53]  Ya-Ju Fan,et al.  On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[54]  Paul Walsh,et al.  Field of genes: using Apache Kafka as a bioinformatic data repository , 2018, GigaScience.

[55]  Christopher J. Seebregts,et al.  An Architecture and Reference Implementation of an Open Health Information Mediator: Enabling Interoperability in the Rwandan Health Information Exchange , 2012, FHIES.

[56]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[57]  M. Guyer,et al.  Charting a course for genomic medicine from base pairs to bedside , 2011, Nature.

[58]  Kyle Banker,et al.  MongoDB in Action , 2011 .

[59]  D. Dimitrov Medical Internet of Things and Big Data in Healthcare , 2016, Healthcare informatics research.

[60]  M. Jones Process real-time big data with Twitter Storm An introduction to streaming big data , 2019 .

[61]  Jian Kang Wu,et al.  Middleware for Wireless Medical Body Area Network , 2007, 2007 IEEE Biomedical Circuits and Systems Conference.

[62]  Wei-Chang Yeh,et al.  A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method , 2009, Expert Syst. Appl..

[63]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[64]  Syed Mahfuzul Aziz,et al.  Review of Cyber-Physical System in Healthcare , 2014, Int. J. Distributed Sens. Networks.

[65]  Yabin Xu,et al.  Semantic-based data integration model applied to heterogeneous medical information system , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[66]  Marek S. Wiewiórka,et al.  SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision , 2014, Bioinform..

[67]  F. Harrell,et al.  Artificial neural networks improve the accuracy of cancer survival prediction , 1997, Cancer.

[68]  I. Sim,et al.  Physicians' use of electronic medical records: barriers and solutions. , 2004, Health affairs.

[69]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..

[70]  Nico Karssemeijer,et al.  Large scale deep learning for computer aided detection of mammographic lesions , 2017, Medical Image Anal..

[71]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[72]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[73]  Md Saiful Islam,et al.  A Systematic Review on Healthcare Analytics: Application and Theoretical Perspective of Data Mining , 2018, Healthcare.

[74]  Roland N. Boubela,et al.  Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project , 2016, Front. Neurosci..

[75]  Meikang Qiu,et al.  Health-CPS: Healthcare Cyber-Physical System Assisted by Cloud and Big Data , 2017, IEEE Systems Journal.

[76]  Divya Tomar,et al.  A survey on Data Mining approaches for Healthcare , 2013, BSBT 2013.

[77]  Wei Hu,et al.  Design and Construction of a Big Data Analytics Framework for Health Applications , 2015, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).

[78]  T. Santhanam,et al.  Application of K-Means and Genetic Algorithms for Dimension Reduction by Integrating SVM for Diabetes Diagnosis , 2015 .

[79]  Marc S. Williams,et al.  Integration of Genomics into the Electronic Health Record: Mapping Terra Incognita , 2013, Genetics in Medicine.

[80]  Magdalena Balazinska,et al.  Managing Skew in Hadoop , 2013, IEEE Data Eng. Bull..

[81]  Manuel Graña,et al.  Neurocognitive disorder detection based on feature vectors extracted from VBM analysis of structural MRI , 2011, Comput. Biol. Medicine.

[82]  Debashis Ghosh,et al.  COPA - cancer outlier profile analysis , 2006, Bioinform..

[83]  Sati Mazumdar,et al.  Empirically derived decision trees for the treatment of late-life depression. , 2008, The American journal of psychiatry.

[84]  P. K. Anooj,et al.  Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules , 2012, J. King Saud Univ. Comput. Inf. Sci..

[85]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[86]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[87]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[88]  Erkki J. Soini,et al.  Risk factors for persistent frequent use of the primary health care services among frequent attenders: A Bayesian approach , 2010, Scandinavian journal of primary health care.

[89]  Martin Grund,et al.  Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.

[90]  Hiroshi Hata,et al.  An activity monitoring system for detecting movement by a person lying on a bed , 2013, 2013 IEEE Third International Conference on Consumer Electronics ¿ Berlin (ICCE-Berlin).

[91]  M. P. van den Heuvel,et al.  Deep learning predictions of survival based on MRI in amyotrophic lateral sclerosis , 2016, NeuroImage: Clinical.

[92]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[93]  Giovanna Rosone,et al.  Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform , 2012, Bioinform..

[94]  Ankur Teredesai,et al.  Big data solutions for predicting risk-of-readmission for congestive heart failure patients , 2013, 2013 IEEE International Conference on Big Data.

[95]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[96]  Magdalena Balazinska,et al.  SkewTune: mitigating skew in mapreduce applications , 2012, SIGMOD Conference.

[97]  Nitesh V. Chawla,et al.  Time to CARE: a collaborative engine for practical disease prediction , 2010, Data Mining and Knowledge Discovery.

[98]  Rema Padman,et al.  Social contagion and technology adoption: a study in healthcare professionals. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[99]  Shu-Ching Chen,et al.  Computational Health Informatics in the Big Data Age , 2016, ACM Comput. Surv..

[100]  Hao Wang,et al.  Chronic Diseases and Health Monitoring Big Data: A Survey , 2018, IEEE Reviews in Biomedical Engineering.

[101]  Alexander Kotov,et al.  Social Media Analytics for Healthcare , 2015, Healthcare Data Analytics.

[102]  K. Blanchet,et al.  How to do (or not to do) ... a social network analysis in health systems research. , 2012, Health policy and planning.

[103]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[104]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[105]  S. Borromeo,et al.  A Reconfigurable, Wearable, Wireless ECG System , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[106]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[107]  Yu Hu,et al.  Machine-learning-based classification of real-time tissue elastography for hepatic fibrosis in patients with chronic hepatitis B , 2017, Comput. Biol. Medicine.

[108]  Kayvan Najarian,et al.  Big Data Analytics in Healthcare , 2015, BioMed research international.

[109]  J. Hornegger,et al.  Fast GPU-Based CT Reconstruction using the Common Unified Device Architecture (CUDA) , 2007, 2007 IEEE Nuclear Science Symposium Conference Record.

[110]  Chao Li,et al.  Using the K-Nearest Neighbor Algorithm for the Classification of Lymph Node Metastasis in Gastric Cancer , 2012, Comput. Math. Methods Medicine.

[111]  Stéphane M. Meystre,et al.  A Clinical Use Case to Evaluate the i2b2 Hive: Predicting Asthma Exacerbations , 2009, AMIA.

[112]  Xianju Fei,et al.  Parallelized text classification algorithm for processing large scale TCM clinical data with MapReduce , 2015, 2015 IEEE International Conference on Information and Automation.

[113]  Wayne H. Wolf,et al.  Cyber-physical Systems , 2009, Computer.

[114]  Priyanka Kakria,et al.  A Real-Time Health Monitoring System for Remote Cardiac Patients Using Smartphone and Wearable Sensors , 2015, International journal of telemedicine and applications.

[115]  K. Rasmussen [Electronic medical records]. , 2006, Ugeskrift for laeger.

[116]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[117]  Zhaozheng Yin,et al.  Human Activity Recognition Using Wearable Sensors by Deep Convolutional Neural Networks , 2015, ACM Multimedia.

[118]  Melanie Swan,et al.  Blockchain: Blueprint for a New Economy , 2015 .

[119]  Ana Paula Appel,et al.  A Social Network Analysis Framework for Modeling Health Insurance Claims Data , 2018, ArXiv.

[120]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[121]  Kerstin Denecke Integrating Social Media and Mobile Sensor Data for Clinical Decision Support: Concept and Requirements , 2016, Nursing Informatics.

[122]  Suresh Sankaranarayanan,et al.  Development of a Health Information System in the Mobile Cloud Environment , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[123]  O. Hardiman,et al.  Survival prediction in Amyotrophic lateral sclerosis based on MRI measures and clinical characteristics , 2017, BMC Neurology.

[124]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[125]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[126]  Vincent Rouet,et al.  Low Power Tracking System for Advanced Health Monitoring , 2008 .

[127]  Cedric Angelo Festin,et al.  chitSMS: Community Health Information Tracking System Using Short Message Service , 2010, 2010 3rd International Conference on Human-Centric Computing.

[128]  H. Willard,et al.  Genomic and personalized medicine: foundations and applications. , 2009, Translational research : the journal of laboratory and clinical medicine.

[129]  Mahmut Ozer,et al.  EEG signals classification using the K-means clustering and a multilayer perceptron neural network model , 2011, Expert Syst. Appl..

[130]  Jamil Ahmed,et al.  Hadoop Architecture and Its Issues , 2014, 2014 International Conference on Computational Science and Computational Intelligence.

[131]  Juan Manuel Górriz,et al.  Early diagnosis of Alzheimer's disease based on partial least squares, principal component analysis and support vector machine using segmented MRI images , 2015, Neurocomputing.

[132]  Steve B. Jiang,et al.  GPU-based ultrafast IMRT plan optimization , 2009, Physics in medicine and biology.

[133]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[134]  Cees T. A. M. de Laat,et al.  Defining architecture components of the Big Data Ecosystem , 2014, 2014 International Conference on Collaboration Technologies and Systems (CTS).

[135]  Dipak Kalra,et al.  Clinical information modeling processes for semantic interoperability of electronic health records: systematic review and inductive analysis , 2015, J. Am. Medical Informatics Assoc..

[136]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[137]  Kaveh Pahlavan,et al.  Enlighten Wearable Physiological Monitoring Systems: On-Body RF Characteristics Based Human Motion Classification Using a Support Vector Machine , 2016, IEEE Transactions on Mobile Computing.

[138]  Guillem Pratx,et al.  Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA. , 2011, Medical physics.

[139]  Jean-Raoul Scherrer,et al.  Middleware for Healthcare Information Systems , 1998, MedInfo.

[140]  Michael E. Papka,et al.  Distributed and hardware accelerated computing for clinical medical imaging using proton computed tomography (pCT) , 2013, J. Parallel Distributed Comput..

[141]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[142]  Ping-Min Lin,et al.  A fall detection system using k-nearest neighbor classifier , 2010, Expert Syst. Appl..

[143]  M. A. Hoffman,et al.  The genome-enabled electronic medical record , 2007, J. Biomed. Informatics.

[144]  Prateep Misra,et al.  Data Analytics in Ubiquitous Sensor-Based Health Information Systems , 2012, 2012 Sixth International Conference on Next Generation Mobile Applications, Services and Technologies.

[145]  Hyunjung Shin,et al.  Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare. , 2008, Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference.

[146]  Andrew P. Bradley,et al.  Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus , 2010, IEEE Transactions on Information Technology in Biomedicine.

[147]  Taghi M. Khoshgoftaar,et al.  A review of data mining using big data in health informatics , 2013, Journal Of Big Data.

[148]  Kevin F. R. Liu,et al.  BBN-Based Decision Support for Health Risk Analysis , 2009, 2009 Fifth International Joint Conference on INC, IMS and IDC.

[149]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[150]  Rob Stocker,et al.  Applying k-Nearest Neighbour in Diagnosing Heart Disease Patients , 2012 .

[151]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[152]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[153]  Ioanna Chouvarda,et al.  AEGLE: A big bio-data analytics framework for integrated health-care services , 2015, 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[154]  Andrew Lippman,et al.  MedRec: Using Blockchain for Medical Data Access and Permission Management , 2016, 2016 2nd International Conference on Open and Big Data (OBD).

[155]  Chen Ji,et al.  Medoop: A medical information platform based on Hadoop , 2013, 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013).

[156]  Paul Lukowicz,et al.  AMON: a wearable multiparameter medical monitoring and alert system , 2004, IEEE Transactions on Information Technology in Biomedicine.

[157]  Syed Sibte Raza Abidi,et al.  Towards a 'Big' Health Data Analytics Platform , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[158]  Melissa Steward Electronic Medical Records , 2005, The Journal of legal medicine.

[159]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[160]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[161]  David A. Sontag,et al.  Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors , 2015, Big Data.

[162]  Daniel Rueckert,et al.  Random forest-based similarity measures for multi-modal classification of Alzheimer's disease , 2013, NeuroImage.

[163]  Jordi Torres,et al.  BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data , 2015, Nucleic Acids Res..

[164]  Cicilia Leite,et al.  Middleware for remote healthcare monitoring , 2009, 2009 International Conference on Innovations in Information Technology (IIT).

[165]  Gianluigi Zanetti,et al.  Scalable genomics: from raw data to aligned reads on Apache YARN , 2016, bioRxiv.