Leveraging the power of local spatial autocorrelation in geophysical interpolative clustering

Nowadays ubiquitous sensor stations are deployed worldwide, in order to measure several geophysical variables (e.g. temperature, humidity, light) for a growing number of ecological and industrial processes. Although these variables are, in general, measured over large zones and long (potentially unbounded) periods of time, stations cannot cover any space location. On the other hand, due to their huge volume, data produced cannot be entirely recorded for future analysis. In this scenario, summarization, i.e. the computation of aggregates of data, can be used to reduce the amount of produced data stored on the disk, while interpolation, i.e. the estimation of unknown data in each location of interest, can be used to supplement station records. We illustrate a novel data mining solution, named interpolative clustering, that has the merit of addressing both these tasks in time-evolving, multivariate geophysical applications. It yields a time-evolving clustering model, in order to summarize geophysical data and computes a weighted linear combination of cluster prototypes, in order to predict data. Clustering is done by accounting for the local presence of the spatial autocorrelation property in the geophysical data. Weights of the linear combination are defined, in order to reflect the inverse distance of the unseen data to each cluster geometry. The cluster geometry is represented through shape-dependent sampling of geographic coordinates of clustered stations. Experiments performed with several data collections investigate the trade-off between the summarization capability and predictive accuracy of the presented interpolative clustering algorithm.

[1]  P. Legendre,et al.  Modelling directional spatial processes in ecological data , 2008 .

[2]  R. Pace,et al.  Sparse spatial autoregressions , 1997 .

[3]  Daniela Stojanova,et al.  Using relational decision trees to model out-crossing rates in a multi-field setting , 2012 .

[4]  P. Guttorp,et al.  Nonparametric Estimation of Nonstationary Spatial Covariance Structure , 1992 .

[5]  R. Olea Geostatistics for Natural Resources Evaluation By Pierre Goovaerts, Oxford University Press, Applied Geostatistics Series, 1997, 483 p., hardcover, $65 (U.S.), ISBN 0-19-511538-4 , 1999 .

[6]  N. Cressie The origins of kriging , 1990 .

[7]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[8]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[9]  Wojtek J. Krzanowski,et al.  An Overview of Approaches to the Analysis and Modelling of Multivariate Geostatistical Data , 2012, Mathematical Geosciences.

[10]  Zhikui Chen,et al.  A clustering approximation mechanism based on data spatial correlation in wireless sensor networks , 2010, 2010 Wireless Telecommunications Symposium (WTS).

[11]  Ramesh S. V. Teegavarapu,et al.  Geo-spatial grid-based transformations of precipitation estimates using spatial interpolation methods , 2012, Comput. Geosci..

[12]  Jennifer Neville,et al.  A Shrinkage Approach for Modeling Non-stationary Relational Autocorrelation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .

[14]  Donato Malerba,et al.  Enhancing Regression Models with Spatio-temporal Indicator Additions , 2013, AI*IA.

[15]  Philip S. Yu,et al.  On Clustering Massive Data Streams: A Summarization Paradigm , 2007, Data Streams - Models and Algorithms.

[16]  Donato Malerba,et al.  Using trend clusters for spatiotemporal interpolation of missing data in a sensor network , 2013, J. Spatial Inf. Sci..

[17]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[18]  Panagiotis Tsiotras,et al.  Image segmentation on cell-center sampled quadtree and octree grids , 2009, Electronic Imaging.

[19]  Luca Scrucca,et al.  Clustering multivariate spatial data based on local measures of spatial autocorrelation , 2005 .

[20]  Luís Torgo,et al.  Spatial Interpolation Using Multiple Regression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  Michelangelo Ceci,et al.  Dealing with spatial autocorrelation when learning predictive clustering trees , 2013, Ecol. Informatics.

[22]  Zekai Şen,et al.  Spatial interpolation and estimation of solar irradiation by cumulative semivariograms , 2001 .

[23]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[24]  João Gama,et al.  Clustering Distributed Sensor Data Streams , 2008, ECML/PKDD.

[25]  W. Collins,et al.  The NCEP–NCAR 50-Year Reanalysis: Monthly Means CD-ROM and Documentation , 2001 .

[26]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[27]  L. Dublin Vital Statistics. , 1961, British medical journal.

[28]  Stéphane Dray,et al.  Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM) , 2006 .

[29]  Tapio Elomaa,et al.  Multi-target regression with rule ensembles , 2012, J. Mach. Learn. Res..

[30]  R. Bilonick An Introduction to Applied Geostatistics , 1989 .

[31]  Arthur Getis,et al.  A History of the Concept of Spatial Autocorrelation: A Geographer's Perspective , 2008 .

[32]  David W. S. Wong,et al.  An adaptive inverse-distance weighting spatial interpolation technique , 2008, Comput. Geosci..

[33]  D. Griffith Spatial Autocorrelation , 2020, Spatial Analysis Methods and Practice.

[34]  Jie Tian,et al.  Spatiotemporal Interpolation Methods for Air Pollution Exposure , 2011, SARA.

[35]  Donato Malerba,et al.  Summarizing numeric spatial data streams by trend cluster discovery , 2013, Data Mining and Knowledge Discovery.

[36]  Shi Jun,et al.  Unsupervised classification of polarimetric SAR Image by Quad-tree Segment and SVM , 2007, 2007 1st Asian and Pacific Conference on Synthetic Aperture Radar.

[37]  P. Burrough,et al.  Principles of geographical information systems , 1998 .

[38]  Raja Chiky,et al.  Summarizing Distributed Data Streams for Storage in Data Warehouses , 2008, DaWaK.

[39]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[40]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[41]  Michael Edward Hohn,et al.  An Introduction to Applied Geostatistics: by Edward H. Isaaks and R. Mohan Srivastava, 1989, Oxford University Press, New York, 561 p., ISBN 0-19-505012-6, ISBN 0-19-505013-4 (paperback), $55.00 cloth, $35.00 paper (US) , 1991 .

[42]  François Ingelrest,et al.  SensorScope: Application-specific sensor network for environmental monitoring , 2010, TOSN.

[43]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[44]  Jörg Sander,et al.  Effective Summarization of Multi-Dimensional Data Streams for Historical Stream Mining , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[45]  Lixin Li,et al.  A Comparison of Spatio-temporal Interpolation Methods , 2002, GIScience.

[46]  Zachary A. Holden,et al.  Using fuzzy C-means and local autocorrelation to cluster satellite-inferred burn severity classes , 2010 .

[47]  Gwo-Fong Lin,et al.  A spatial interpolation method based on radial basis function networks incorporating a semivariogram model , 2004 .

[48]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[49]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[50]  Thibaut Jombart,et al.  Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis , 2011, 1202.6485.

[51]  A. Getis The Analysis of Spatial Association by Use of Distance Statistics , 2010 .

[52]  Michelangelo Ceci,et al.  Learning and Transferring Geographically Weighted Regression Trees across Time , 2011, MSM/MUSE.

[53]  Ronald P. Barry,et al.  Quick Computation of Spatial Autoregressive Estimators , 2010 .

[54]  N. Lam Spatial Interpolation Methods: A Review , 1983 .

[55]  Michelangelo Ceci,et al.  Network Regression with Predictive Clustering Trees , 2011, ECML/PKDD.

[56]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[57]  J. LeSage,et al.  Spatial Dependence in Data Mining , 2001 .

[58]  Ioannis Z. Gitas,et al.  EVALUATION OF SPATIAL INTERPOLATION TECHNIQUES FOR MAPPING AGRICULTURAL TOPSOIL PROPERTIES IN CRETE , 2009 .

[59]  Barry Boots,et al.  Local measures of spatial association , 2002 .

[60]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[61]  P. Legendre Spatial Autocorrelation: Trouble or New Paradigm? , 1993 .

[62]  Arkadiusz Wojna,et al.  RIONA: A Classifier Combining Rule Induction and k-NN Method with Automated Selection of Optimal Neighbourhood , 2002, ECML.

[63]  D G Krige,et al.  A statistical approach to some mine valuation and allied problems on the Witwatersrand , 2015 .

[64]  Saso Dzeroski,et al.  Incremental multi-target model trees for data streams , 2011, SAC.

[65]  Gregory M. P. O’Hare,et al.  The application of cluster analysis in geophysical data interpretation , 2010 .

[66]  Lars Kulik,et al.  Spatial interpolation in wireless sensor networks: localized algorithms for variogram modeling and Kriging , 2010, GeoInformatica.