Ensemble Methods in Environmental Data Mining

Environmental data mining is the nontrivial process of identifying valid, novel, and potentially useful patterns in data from environmental sciences. This chapter proposes ensemble methods in environmental data mining that combines the outputs from multiple classification models to obtain better results than the outputs that could be obtained by an individual model. The study presented in this chapter focuses on several ensemble strategies in addition to the standard single classifiers such as decision tree, naive Bayes, support vector machine, and k-nearest neighbor (KNN), popularly used in literature. This is the first study that compares four ensemble strategies for environmental data mining: (i) bagging, (ii) bagging combined with random feature subset selection (the random forest algorithm), (iii) boosting (the AdaBoost algorithm), and (iv) voting of different algorithms. In the experimental studies, ensemble methods are tested on different real-world environmental datasets in various subjects such as air, ecology, rainfall, and soil.

[1]  Paul E. Gessler,et al.  The application of ensemble techniques for land-cover classification in arid lands , 2015 .

[2]  Fangbai Li,et al.  Using ensemble models to identify and apportion heavy metal pollution sources in agricultural soils on a local scale. , 2015, Environmental pollution.

[3]  A. Kathuria,et al.  Modelling the response of wheat grain yield to climate change: a sensitivity analysis , 2012, Theoretical and Applied Climatology.

[4]  Giovanni De Marinis,et al.  Machine learning methods for wastewater hydraulics , 2017 .

[5]  Shengwei Wang,et al.  Development of prediction models for next-day building energy consumption and peak power demand using data mining techniques , 2014 .

[6]  Tim Appelhans,et al.  Improving the accuracy of rainfall rates from optical satellite sensors with machine learning — A random forests-based approach applied to MSG SEVIRI , 2014 .

[7]  Radiša Jovanović,et al.  Ensemble of various neural networks for prediction of heating energy consumption , 2015 .

[8]  Zeyar Aung,et al.  Probabilistic Forecasting of Solar Power: An Ensemble Learning Approach , 2017, KES-IDT.

[9]  Zhe Zhu,et al.  Mapping forest change using stacked generalization: An ensemble approach , 2018 .

[10]  E. Edirisinghe,et al.  Modelling ground-level ozone concentration using ensemble learning algorithms , 2015 .

[11]  Aranildo R. Lima,et al.  Nonlinear regression in environmental sciences by support vector machines combined with evolutionary strategy , 2013, Comput. Geosci..

[12]  M. Pulido‐Velazquez,et al.  Shifts in the suitable habitat available for brown trout (Salmo trutta L.) under short-term climate change scenarios. , 2016, The Science of the total environment.

[13]  Randal K. Taylor,et al.  Adaboost and Support Vector Machine Classifiers for Automatic Weed Control: Canola and Wheat , 2010 .

[14]  Ignacio Pavón García,et al.  Neural based contingent valuation of road traffic noise , 2017 .

[15]  Konstantinos Demertzis,et al.  HISYCOL a hybrid computational intelligence system for combined machine learning: the case of air pollution modeling in Athens , 2015, Neural Computing and Applications.

[16]  V. Rodriguez-Galiano,et al.  Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain). , 2014, The Science of the total environment.

[17]  S. Dobrowski,et al.  Evaluating ensemble forecasts of plant species distributions under climate change , 2013 .

[18]  Jessica L. Fitterer,et al.  Predicting Climate Change Impacts to the Canadian Boreal Forest , 2014 .

[19]  C. Ginzler,et al.  Combining ensemble modeling and remote sensing for mapping individual tree species at high spatial resolution , 2013 .

[20]  Shikha Gupta,et al.  Identifying pollution sources and predicting urban air quality using ensemble learning methods , 2013 .

[21]  Zhenhua Zhang,et al.  A Novel Combinational Forecasting Model of Dust Storms Based on Rare Classes Classification Algorithm , 2014, GRMSE.

[22]  Anders Knudby,et al.  New approaches to modelling fish―habitat relationships , 2010 .

[23]  Margaret G. Schmidt,et al.  Comparing the use of training data derived from legacy soil pits and soil survey polygons for mapping soil classes , 2017 .

[24]  Ashok N. Srivastava Greener aviation with virtual sensors: a case study , 2011, Data Mining and Knowledge Discovery.

[25]  Marco Bindi,et al.  MODELLING THE IMPACT OF CLIMATE CHANGE ON THE HUNGARIAN WINE REGIONS USING RANDOM FOREST , 2012 .

[26]  Miriam A. M. Capretz,et al.  An ensemble learning framework for anomaly detection in building energy consumption , 2017 .

[27]  B. Gabrys,et al.  Robust predictive modelling of water pollution using biomarker data. , 2010, Water research.

[28]  I. Reljin,et al.  Comprehensive analysis of PM10 in Belgrade urban area on the basis of long-term measurements , 2016, Environmental Science and Pollution Research.

[29]  Saso Dzeroski,et al.  Habitat modeling with single- and multi-target trees and ensembles , 2013, Ecol. Informatics.

[30]  Feng Wan,et al.  Applying Ensemble Learning Techniques to ANFIS for Air Pollution Index Prediction in Macau , 2012, ISNN.