Environmental data mining and modeling based on machine learning algorithms and geostatistics

Abstract The paper presents some contemporary approaches to spatial environmental data analysis. The main topics are concentrated on the decision-oriented problems of environmental spatial data mining and modeling: valorization and representativity of data with the help of exploratory data analysis, spatial predictions, probabilistic and risk mapping, development and application of conditional stochastic simulation models. The innovative part of the paper presents integrated/hybrid model—machine learning (ML) residuals sequential simulations—MLRSS. The models are based on multilayer perceptron and support vector regression ML algorithms used for modeling long-range spatial trends and sequential simulations of the residuals. ML algorithms deliver non-linear solution for the spatial non-stationary problems, which are difficult for geostatistical approach. Geostatistical tools (variography) are used to characterize performance of ML algorithms, by analyzing quality and quantity of the spatially structured information extracted from data with ML algorithms. Sequential simulations provide efficient assessment of uncertainty and spatial variability. Case study from the Chernobyl fallouts illustrates the performance of the proposed model. It is shown that probability mapping, provided by the combination of ML data driven and geostatistical model based approaches, can be efficiently used in decision-making process.

[1]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[2]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[3]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[4]  Giuseppe Gambolati,et al.  Comment on “analysis of nonintrinsic spatial variability by residual kriging with application to regional groundwater levels” by Shlomo P. Neuman and Elizabeth A. Jacobson , 1987 .

[5]  Michael Edward Hohn,et al.  An Introduction to Applied Geostatistics: by Edward H. Isaaks and R. Mohan Srivastava, 1989, Oxford University Press, New York, 561 p., ISBN 0-19-505012-6, ISBN 0-19-505013-4 (paperback), $55.00 cloth, $35.00 paper (US) , 1991 .

[6]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[7]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[8]  Timothy C. Haas MULTIVARIATE SPATIAL PREDICTION IN THE PRESENCE OF NON‐LINEAR TREND AND COVARIANCE NON‐STATIONARITY , 1996 .

[9]  Kurt Fedra,et al.  A hybrid expert system, GIS, and simulation modeling for environmental and technological risk management , 2002 .

[10]  Vasily Demyanov,et al.  ARTIFICAL NEURAL NETWORKS AND SPATIAL ESTIMATION OF CHERNOBYL FALLOUT , 1996 .

[11]  Samy Bengio,et al.  Local Machine Learning Models for Spatial Data Analysis , 2000 .

[12]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[13]  Stéphane Canu,et al.  Support Vector Machines for Classification and Mapping of Reservoir Data Support Vector Machines for Classification and Mapping of Reservoir Data , 2022 .

[14]  S. P. Neuman,et al.  Analysis of nonintrinsic spatial variability by residual kriging with application to regional groundwater levels , 1984 .

[15]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Peter A. Dowd The use of Neural Networks for Spatial Simulation , 1994 .

[18]  Clayton V. Deutsch,et al.  GSLIB: Geostatistical Software Library and User's Guide , 1993 .

[19]  Raphaël Pélissier,et al.  A practical approach to the study of spatial structure in simple cases of heterogeneous vegetation , 2001 .