A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment

A number of technologies enabled by Internet of Thing (IoT) have been used for the prevention of various chronic diseases, continuous and real-time tracking system is a particularly important one. Wearable medical devices with sensor, health cloud and mobile applications have continuously generating a huge amount of data which is often called as streaming big data. Due to the higher speed of the data generation, it is difficult to collect, process and analyze such massive data in real-time in order to perform real-time actions in case of emergencies and extracting hidden value. using traditional methods which are limited and time-consuming. Therefore, there is a significant need to real-time big data stream processing to ensure an effective and scalable solution. In order to overcome this issue, this work proposes a new architecture for real-time health status prediction and analytics system using big data technologies. The system focus on applying distributed machine learning model on streaming health data events ingested to Spark streaming through Kafka topics. Firstly, we transform the standard decision tree (DT) (C4.5) algorithm into a parallel, distributed, scalable and fast DT using Spark instead of Hadoop MapReduce which becomes limited for real-time computing. Secondly, this model is applied to streaming data coming from distributed sources of various diseases to predict health status. Based on several input attributes, the system predicts health status, send an alert message to care providers and store the details in a distributed database to perform health data analytics and stream reporting. We measure the performance of Spark DT against traditional machine learning tools including Weka. Finally, performance evaluation parameters such as throughput and execution time are calculated to show the effectiveness of the proposed architecture. The experimental results show that the proposed system is able to effectively process and predict real-time and massive amount of medical data enabled by IoT from distributed and various diseases.

[1]  Taghi M. Khoshgoftaar,et al.  A review of data mining using big data in health informatics , 2013, Journal Of Big Data.

[2]  Mosima Anna Masethe,et al.  Prediction Of Heart Disease Using Classification Algorithms , 2020 .

[3]  Nasseh Tabrizi,et al.  A Survey on Real-Time Big Data Analytics: Applications and Tools , 2016, 2016 International Conference on Computational Science and Computational Intelligence (CSCI).

[4]  Susan Mengel,et al.  Examination of data, rule generation and detection of phishing URLs using online logistic regression , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[5]  Sreekanth Rallapalli,et al.  Predicting the risk of diabetes in big data electronic health Records by using scalable random forest classification algorithm , 2016, 2016 International Conference on Advances in Computing and Communication Engineering (ICACCE).

[6]  Divya Tomar,et al.  A survey on Data Mining approaches for Healthcare , 2013, BSBT 2013.

[7]  Alok N. Choudhary,et al.  Real-time disease surveillance using Twitter data: demonstration on flu and cancer , 2013, KDD.

[8]  Sreekanth Rallapalli,et al.  Cloud Based K-Means Clustering Running as a MapReduce Job for Big Data Healthcare Analytics Using Apache Mahout , 2016 .

[9]  Kayvan Najarian,et al.  Big Data Analytics in Healthcare , 2015, BioMed research international.

[10]  Anitha S. Pillai,et al.  Big Data Challenges and Solutions in Healthcare: A Survey , 2015, IBICA.

[11]  Nabendu Chaki,et al.  Personal Health Record Management System Using Hadoop Framework: An Application for Smarter Health Care , 2016 .

[12]  Eeti Jain,et al.  Performance comparision of Hadoop and spark engine , 2017, 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC).

[13]  T. Eswari,et al.  Diabetic data analysis in healthcare using Hadoop architecture over big data , 2017 .

[14]  Khalil Maalmi,et al.  Real-time machine learning for early detection of heart disease using big data approach , 2019, 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS).

[15]  Sungyoung Lee,et al.  Challenges in Managing Real-Time Data in Health Information System (HIS) , 2016, ICOST.

[16]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[17]  Awais Ahmad,et al.  Hadoop-Based Intelligent Care System (HICS) , 2017, ACM Trans. Internet Techn..

[18]  Gunasekaran Manogaran,et al.  A survey of big data architectures and machine learning algorithms in healthcare , 2017 .

[19]  Amy Loutfi,et al.  Data Mining for Wearable Sensors in Health Monitoring Systems: A Review of Recent Trends and Challenges , 2013, Sensors.

[20]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[21]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[22]  H. Bauer,et al.  The Internet of Things: Sizing up the opportunity , 2014 .

[23]  Gunasekaran Manogaran,et al.  Health data analytics using scalable logistic regression with stochastic gradient descent , 2018, Int. J. Adv. Intell. Paradigms.

[24]  Syed Akhter Hossain,et al.  NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison , 2013, ArXiv.

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[27]  Aruna Tiwari,et al.  Breast cancer diagnosis using Genetically Optimized Neural Network model , 2015, Expert Syst. Appl..

[28]  Khalil Maalmi,et al.  Application of Machine Learning Model on Streaming Health Data Event in Real-Time to Predict Health Status Using Spark , 2018, 2018 International Symposium on Advanced Electrical and Communication Technologies (ISAECT).

[29]  Mahmudul Hassan,et al.  Semantic Data Querying over NoSQL Databases with Apache Spark , 2018, 2018 IEEE International Conference on Information Reuse and Integration (IRI).

[30]  Ishwarappa,et al.  A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology , 2015 .

[31]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[32]  J Antony Basco,et al.  Real-time analysis of healthcare using big data analytics , 2017 .

[33]  Xingshe Zhou,et al.  Detecting Abnormal Patterns of Daily Activities for the Elderly Living Alone , 2014, HIS.

[34]  Awais Ahmad,et al.  Real-time Medical Emergency Response System: Exploiting IoT and Big Data for Public Health , 2016, Journal of Medical Systems.