Data mining techniques on satellite images for discovery of risk areas

The high rates of cholera epidemic mortality in less developed countries is a challenge for health fa- cilities to which it is necessary to equip itself with the epidemiological surveillance. To strengthen the capacity of epidemiological surveillance, this paper focuses on remote sensing satellite data processing using data mining methods to discover risk areas of the epidemic disease by connecting the environ- ment, climate and health. These satellite data are combined with field data collected during the same set of periods in order to explain and deduct the causes of the epidemic evolution from one period to another in relation to the environment. The existing technical (algorithms) for processing satellite im- ages are mature and efficient, so the challenge today is to provide the most suitable means allowing the best interpretation of obtained results. For that, we focus on supervised classification algorithm to process a set of satellite images from the same area but on different periods. A novel research method- ology (describing pre-treatment, data mining, and post-treatment) is proposed to ensure suitable means for transforming data, generating information and extracting knowledge. This methodology consists of six phases: (1.A) Acquisition of information from the field about epidemic, (1.B) Satellite data acquisition, (2) Selection and transformation of data (Data derived from images), (3) Remote sensing measurements, (4) Discretization of data, (5) Data treatment, and (6) Interpretation of results. The main contributions of the paper are: to establish the nature of links between the environment and the epidemic, and to highlight those risky environments when the public awareness of the problem and the prevention policies are absolutely necessary for mitigation of the propagation and emergence of the epidemic. This will allow national governments, local authorities and the public health officials to effective management according to risk areas. The case study concerns the knowledge discovery in databases related to risk areas of the cholera epidemic in Mopti region, Mali (West Africa). The results generate from data mining association rules indicate that the level of the Niger River in the wintering periods and some societal factors have an impact on the variation of cholera epidemic rate in Mopti town. More the river level is high, at 66% the rate of contamination is high.

[1]  Nnadi Nnaemeka Emmanuel,et al.  Landscape epidemiology: An emerging perspective in the mapping and modelling of disease and disease risk factors , 2011 .

[2]  Manuchehr Farajzadeh,et al.  Developing a climate-based risk map of fascioliasis outbreaks in Iran. , 2015, Journal of infection and public health.

[3]  Michael A. Wulder,et al.  Opening the archive: How free data has enabled the science and monitoring promise of Landsat , 2012 .

[4]  Guo-Jing Yang,et al.  Implementing a geospatial health data infrastructure for control of Asian schistosomiasis in the People's Republic of China and the Philippines. , 2010, Advances in parasitology.

[5]  Yi Liu,et al.  Software to facilitate remote sensing data access for disease early warning systems , 2015, Environ. Model. Softw..

[6]  Bernard Kamsu-Foguem,et al.  Mining association rules for the quality improvement of the production process , 2013, Expert Syst. Appl..

[7]  Saro Lee,et al.  Prediction of landslides using ASTER imagery and data mining models , 2012 .

[8]  C. Tucker Red and photographic infrared linear combinations for monitoring vegetation , 1979 .

[9]  S. Running,et al.  Remote Sensing of Forest Fire Severity and Vegetation Recovery , 1996 .

[10]  T. Hobbs,et al.  The use of NOAA-AVHRR NDVI data to assess herbage production in the arid rangelands of Central Australia , 1995 .

[11]  Sounkalo Dao,et al.  Les épidémies de choléra au Mali de 1995 à 2004 , 2009 .

[12]  A. Huete,et al.  Overview of the radiometric and biophysical performance of the MODIS vegetation indices , 2002 .

[13]  Danny Lo Seen,et al.  Crop area mapping in West Africa using landscape stratification of MODIS time series and comparison with existing global land products , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[14]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[15]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[16]  Nanno Mulder,et al.  Likelihood-based image segmentation and classification: a framework for the integration of expert knowledge in image classification procedures , 2000 .

[17]  Asmala Ahmad,et al.  Analysis of Maximum Likelihood Classificationon Multispectral Data , 2012 .

[18]  Murielle Lafaye,et al.  Satellite imaging and vector-borne diseases: the approach of the French National Space Agency (CNES). , 2008, Geospatial health.

[19]  A. Huete A soil-adjusted vegetation index (SAVI) , 1988 .

[20]  John A. Richards,et al.  Remote Sensing Digital Image Analysis , 1986 .

[21]  Gregory D. Bierly,et al.  Spatial ecology, landscapes, and the geography of vector-borne disease: A multi-disciplinary review , 2015 .

[22]  Anne Laurent,et al.  Spatio-temporal data classification through multidimensional sequential patterns: Application to crop mapping in complex landscape , 2015, Eng. Appl. Artif. Intell..

[23]  G. Heuvelink,et al.  A generic framework for spatial prediction of soil variables based on regression-kriging , 2004 .

[24]  Danielle Wood,et al.  The current and potential role of satellite remote sensing in the campaign against malaria , 2016 .

[25]  J. Lacaux,et al.  Classification of ponds from high-spatial resolution remote sensing: Application to Rift Valley Fever epidemics in Senegal , 2007 .

[26]  W. Dean Hively,et al.  Evaluating the relationship between biomass, percent groundcover and remote sensing indices across six winter cover crop fields in Maryland, United States , 2015, Int. J. Appl. Earth Obs. Geoinformation.

[27]  Madjid Tavana,et al.  A practical taxonomy of methods and literature for managing uncertain spatial data in geographic information systems , 2016 .

[28]  Nicolas Passat,et al.  Extraction of complex patterns from multiresolution remote sensing images: A hierarchical top-down methodology , 2012, Pattern Recognit..

[29]  Zhiwei Xu,et al.  Impact of temperature on childhood pneumonia estimated from satellite remote sensing. , 2014, Environmental research.

[30]  Cecile Vignolles,et al.  Modeling the dynamics of mosquito breeding sites vs rainfall in Barkedji area, Senegal , 2015 .

[31]  Michael E. Hodgson,et al.  Satellite image collection modeling for large area hazard emergency response , 2016 .

[32]  Maguelonne Teisseire,et al.  A knowledge discovery process for spatiotemporal data: Application to river water quality monitoring , 2015, Ecol. Informatics.

[33]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[34]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[35]  C. Tucker,et al.  Satellite remote sensing of primary production , 1986 .

[36]  A. Gitelson,et al.  Quantitative estimation of chlorophyll-a using reflectance spectra : experiments with autumn chestnut and maple leaves , 1994 .

[37]  L. Olsen,et al.  Monitoring, Observations, and Remote Sensing – Global Dimensions , 2008 .

[38]  Andrew T. Hudak,et al.  Mapping fire scars in a southern African savannah using Landsat imagery , 2004 .

[39]  Tobias Landmann,et al.  Association of ecological factors with Rift Valley fever occurrence and mapping of risk zones in Kenya. , 2016, International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases.

[40]  G. Asokan,et al.  Leveraging “big data” to enhance the effectiveness of “one health” in an era of health informatics , 2015, Journal of epidemiology and global health.

[41]  Fuan Tsai,et al.  Analysis of topographic and vegetative factors with data mining for landslide verification , 2013 .

[42]  Adrian Groza,et al.  Improving remote sensing crop classification by argumentation-based conflict resolution in ensemble learning , 2016, Expert Syst. Appl..

[43]  Rajan Amin,et al.  Remotely-Sensed Active Fire Data for Protected Area Management: Eight-Year Patterns in the Manas National Park, India , 2010, Environmental management.

[44]  Chengqi Zhang,et al.  POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases , 2009, Expert Syst. Appl..

[45]  A. Gitelson,et al.  Novel algorithms for remote estimation of vegetation fraction , 2002 .

[46]  M. Mabaso,et al.  Critical review of research literature on climate-driven malaria epidemics in sub-Saharan Africa. , 2012, Public health.

[47]  Takeshi Kurosawa,et al.  Regression correlation coefficient for a Poisson regression model , 2016, Comput. Stat. Data Anal..

[48]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[49]  Qihao Weng,et al.  Consistent land surface temperature data generation from irregularly spaced Landsat imagery , 2016 .

[50]  Bhiksha Raj,et al.  A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas , 2015, ArXiv.

[51]  N. Broge,et al.  Comparing prediction power and stability of broadband and hyperspectral vegetation indices for estimation of green leaf area index and canopy chlorophyll density , 2001 .

[52]  T. A. Arentze Spatial data mining, cluster and pattern recognition , 2009 .

[53]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[54]  Pawan K. Joshi,et al.  Decision tree classification of land use land cover for Delhi, India using IRS-P6 AWiFS data , 2011, Expert Syst. Appl..

[55]  Shafiqul Islam,et al.  Satellite Remote Sensing of Space-Time Plankton Variability in the Bay of Bengal: Connections to Cholera Outbreaks. , 2012, Remote sensing of environment.

[56]  M. Alrababah,et al.  Land use/cover classification of arid and semi‐arid Mediterranean landscapes using Landsat ETM , 2006 .

[57]  Bernard Grabot,et al.  Generating knowledge in maintenance from Experience Feedback , 2014, Knowl. Based Syst..

[58]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[59]  R. H. Haas,et al.  Evaluating Landsat Thematic Mapper derived vegetation indices for estimating above-ground biomass on semiarid rangelands , 1993 .

[60]  Clement Atzberger,et al.  Object Based Image Analysis and Data Mining applied to a remotely sensed Landsat time-series to map sugarcane over large areas , 2012 .

[61]  John B. Solie,et al.  Evaluation of Green, Red, and Near Infrared Bands for Predicting Winter Wheat Biomass, Nitrogen Uptake, and Final Grain Yield , 2005 .

[62]  Lei Peng,et al.  Novel classification method for remote sensing images based on information entropy discretization algorithm and vector space model , 2016, Comput. Geosci..

[63]  Michela Bertolotto,et al.  Exploratory spatio-temporal data mining and visualization , 2007, J. Vis. Lang. Comput..

[64]  Monica Papeş,et al.  Applications of geographic information systems and remote sensing techniques to conservation of amphibians in northwestern Ecuador , 2015 .

[65]  Chengqi Zhang,et al.  Data preparation for data mining , 2003, Appl. Artif. Intell..

[66]  Sebastián Ventura,et al.  Discovering useful patterns from multiple instance data , 2016, Inf. Sci..

[67]  R. Piarroux,et al.  [Cholera: epidemiology and transmission. Experience from several humanitarian interventions in Africa, Indian Ocean and Central America]. , 2002, Bulletin de la Societe de pathologie exotique.

[68]  Andreas Schmidt,et al.  Data mining and linked open data – New perspectives for data analysis in environmental research , 2015 .

[69]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[70]  F. Hossain,et al.  A review of applications of satellite earth observation data for global societal benefit and stewardship of planet earth , 2016 .

[71]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..