A Gaussian process based big data processing framework in cluster computing environment

Machine learning algorithms play a vital role in the prediction of an outbreak of diseases based on climate change. Dengue outbreak is caused by improper maintenance of water storages, lack of urbanization, deforestation, lack of vaccination and awareness. Moreover, a number of dengue cases are varying based on climate season. There is a need to develop the prediction model for modeling the dengue outbreak based climate change. To model the dengue outbreak, Gaussian process regression (GPR) model is applied in this paper that uses the seasonal average of various climate parameters such as maximum temperature, minimum temperature, precipitation, wind, relative humidity and solar. The number of dengue cases and climate data for each block of Tamil Nadu, India are collected from Integrated Disease Surveillance Project and Global Weather Data for SWAT Inc respectively. Local Moran’s I spatial autocorrelation is used in this paper for geographical visualization of hotspot regions. The outbreak of dengue and its hot spot regions are geographically visualized with the help of ArcGIS 10.1 software. The day wise big climate data is collected and stored in the Hadoop cluster computing environment. MapReduce framework is used to reduce the day wise climate data into seasonal climate averages such as winter, summer, and monsoon. The seasonal climate data and number of dengue incidence (health data) are integrated based on the geo-location (latitude and longitude). GPR is used to develop the prediction model for dengue based on the integrated data (climate and health data). The proposed Gaussian process based prediction model is compared with various machine learning approaches such as multiple regression, support vector machine and random forests. Experimental results demonstrate the effectiveness of our Gaussian process based prediction framework.

[1]  Gunasekaran Manogaran,et al.  Big Data Security Intelligence for Healthcare Industry 4.0 , 2017 .

[2]  R. Kiruba,et al.  Dengue disease status in Chennai (2006-2008): A retrospective analysis , 2011, The Indian journal of medical research.

[3]  R. Medronho,et al.  Spatial analysis of dengue and the socioeconomic context of the city of Rio de Janeiro (Southeastern Brazil). , 2009, Revista de saude publica.

[4]  S. Cassadou,et al.  Time series analysis of dengue incidence in Guadeloupe, French West Indies: Forecasting models using climate variables as predictors , 2011, BMC infectious diseases.

[5]  Ganesh Chandra Deka,et al.  Big Data Architecture for Climate Change and Disease Dynamics , 2016 .

[6]  Jae-Gil Lee,et al.  Geospatial Big Data: Challenges and Opportunities , 2015, Big Data Res..

[7]  Gisele L. Pappa,et al.  An Accurate Gaussian Process-Based Early Warning System for Dengue Fever , 2016, 2016 5th Brazilian Conference on Intelligent Systems (BRACIS).

[8]  Gunasekaran Manogaran,et al.  Big Data Knowledge System in Healthcare , 2017 .

[9]  James H. Faghmous,et al.  A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science , 2014, Big Data.

[10]  M. Haran,et al.  Estimating the Risk of a Crop Epidemic From Coincident Spatio-temporal Processes , 2010 .

[11]  Gunasekaran Manogaran,et al.  Modelling the H1N1 influenza using mathematical and neural network approaches , 2017 .

[12]  Caitlin Murphy,et al.  Semantic text mining support for lignocellulose research , 2012, BMC Medical Informatics and Decision Making.

[13]  Peter Groves,et al.  The 'big data' revolution in healthcare: Accelerating value and innovation , 2016 .

[14]  R. Eisen,et al.  Using Geographic Information Systems and Decision Support Systems for the Prediction , Prevention , and Control of Vector-Borne Diseases , 2010 .

[15]  Mark J. Schreiber,et al.  Decision Tree Algorithms Predict the Diagnosis and Outcome of Dengue Fever in the Early Phase of Illness , 2008, PLoS neglected tropical diseases.

[16]  A. Wilder-Smith,et al.  Distinguishing dengue fever from other infections on the basis of simple clinical and laboratory features: application of logistic regression analysis. , 2006, Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology.

[17]  Varun Chandola,et al.  A scalable gaussian process analysis algorithm for biomass monitoring , 2011, Stat. Anal. Data Min..

[18]  Anna L. Buczak,et al.  A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data , 2012, BMC Medical Informatics and Decision Making.

[19]  Mavilla Anuradha,et al.  Laboratory diagnosis and incidence of Dengue virus infection: a hospital based study, Perambalur , 2014 .

[20]  Daphne Lopez,et al.  Assessment of Vaccination Strategies Using Fuzzy Multi-criteria Decision Making , 2015 .

[21]  Yu Peng,et al.  Anomaly detection based on data stream monitoring and prediction with improved Gaussian process regression algorithm , 2014, 2014 International Conference on Prognostics and Health Management.

[22]  Tom Fearn Gaussian Process Regression , 2013 .

[23]  P. Gunasekaran,et al.  Dengue epidemiology in Thanjavur and Trichy district, Tamilnadu--Jan 2011-Dec 2011. , 2011, Indian journal of medical sciences.

[24]  Oliver Stegle,et al.  Gaussian Process Robust Regression for Noisy Heart Rate Data , 2008, IEEE Transactions on Biomedical Engineering.

[25]  D. Lopez,et al.  Climate change and disease dynamics - A big data perspective , 2016 .

[26]  Gunasekaran Manogaran,et al.  Spatial cumulative sum algorithm with big data analytics for climate change detection , 2017, Comput. Electr. Eng..

[27]  John L. Schnase,et al.  MERRA Analytic Services: Meeting the Big Data challenges of climate science through cloud-enabled Climate Analytics-as-a-Service , 2013, Comput. Environ. Urban Syst..

[28]  Stefan Edlund,et al.  The spatiotemporal epidemiological modeler , 2010, IHI.

[29]  Michael Höhle,et al.  Additive‐Multiplicative Regression Models for Spatio‐Temporal Epidemics , 2009, Biometrical journal. Biometrische Zeitschrift.

[30]  Harpreet Kaur,et al.  Spatial big data analytics of influenza epidemic in Vellore, India , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[31]  Gaurav S. Sukhatme,et al.  Toward free-living walking speed estimation using Gaussian Process-based Regression with on-body accelerometers and gyroscopes , 2010, 2010 4th International Conference on Pervasive Computing Technologies for Healthcare.

[32]  Gunasekaran Manogaran,et al.  MetaCloudDataStorage Architecture for Big Data Security in Cloud Computing , 2016 .

[33]  Klaus Obermayer,et al.  Gaussian process regression: active data selection and test point rejection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[34]  Stefano Nativi,et al.  Big Data challenges in building the Global Earth Observation System of Systems , 2015, Environ. Model. Softw..

[35]  Klaus Obermayer,et al.  Gaussian Process Regression: Active Data Selection and Test Point Rejection , 2000, DAGM-Symposium.

[36]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[37]  Gunasekaran Manogaran,et al.  Big Data Security Framework for Distributed Cloud Data Centers , 2017 .

[38]  Gunasekaran Manogaran,et al.  Disease Surveillance System for Big Climate Data Processing and Dengue Transmission , 2017, Int. J. Ambient Comput. Intell..

[39]  B. D. de Jong,et al.  Factors associated with mortality in patients with drug-susceptible pulmonary tuberculosis , 2011, BMC infectious diseases.

[40]  Jonathan E. Suk,et al.  Using global maps to predict the risk of dengue in Europe. , 2014, Acta tropica.

[41]  T John Victor,et al.  Laboratory-based dengue fever surveillance in Tamil Nadu, India. , 2007, The Indian journal of medical research.

[42]  Michelle Moore,et al.  Cybersecurity Breaches and Issues Surrounding Online Threat Protection , 2017 .

[43]  Nilanjan Dey,et al.  Internet of Things and Big Data Technologies for Next Generation Healthcare , 2017 .

[44]  A. Clements,et al.  Spatial analysis of notified dengue fever infections , 2010, Epidemiology and Infection.

[45]  Dirk U Pfeiffer,et al.  Spatial and temporal epidemiological analysis in the Big Data era , 2015, Preventive Veterinary Medicine.

[46]  P T Woo A review of studies on the immunization against the pathogenic protozoan diseases of man. , 1974, Acta tropica.

[47]  Darcy A. Davis,et al.  Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework , 2013, Journal of General Internal Medicine.