An information-fusion method to identify pattern of spatial heterogeneity for improving the accuracy of estimation

While spatial autocorrelation is used in spatial sampling survey to improve the precision of the feature’s estimate of a certain population at area units, spatial heterogeneity as the stratification frame in survey also often have a considerable effect upon the precision. Under the context of increasingly enriched spatiotemporal data, this paper suggests an information-fusion method to identify pattern of spatial heterogeneity, which can be used as an informative stratification for improving the estimation accuracy. Data mining is major analysis components in our method: multivariate statistics, association analysis, decision tree and rough set are used in data filter, identification of contributing factors, and examination of relationship; classification and clustering are used to identify pattern of spatial heterogeneity using the auxiliary variables relevant to the goal and thus to stratify the samples. These methods are illustrated and examined in the case study of the cultivable land survey in Shandong Province in China. Different from many stratification schemes which just uses the goal variable to stratify which is too simplified, information from multiple sources can be fused to identify pattern of spatial heterogeneity, thus stratifying samples at geographical units as an informative polygon map, and thereby to increase the precision of estimates in sampling survey, as demonstrated in our case research.

[1]  E Mjolsness,et al.  Machine learning for science: state of the art and future prospects. , 2001, Science.

[2]  I. Rodríguez‐Iturbe,et al.  The design of rainfall networks in time and space , 1974 .

[3]  Rick L. Lawrence,et al.  Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree analysis , 2004 .

[4]  John F. Roddick,et al.  Geographic Data Mining and Knowledge Discovery , 2001 .

[5]  R. Lawrence Rule-Based Classification Systems Using Classification and Regression Tree (CART) Analysis , 2001 .

[6]  F. J. Gallego,et al.  Stratified sampling of satellite images with a systematic grid of points , 2005 .

[7]  Pierre Goovaerts,et al.  Geostatistical and local cluster analysis of high resolution hyperspectral imagery for detection of anomalies , 2005 .

[8]  Brian D. Ripley,et al.  Spatial Statistics: Ripley/Spatial Statistics , 2005 .

[9]  S. Pal,et al.  Segmentation of remotely sensed images with fuzzy thresholding, and quantitative evaluation , 2000 .

[10]  Ivan Bratko,et al.  Machine Learning and Data Mining; Methods and Applications , 1998 .

[11]  Tsau Young Lin,et al.  Rough Set Methods and Applications , 2000 .

[12]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[13]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[14]  Jinfeng Wang,et al.  Optimal decision-making model of spatial sampling for survey of China’s land with remotely sensed data , 2005 .

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[16]  Li Deren Theories and Technologies of Spatial Data Mining and Knowledge Discovery , 2002 .

[17]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[18]  Shashi Shekhar,et al.  Discovery of patterns in earth science data using data mining , 2005 .

[19]  Liu Ming ON CURRENT CULTIVATED LAND CHANGE BASED ON GEOMORPHOLOGY AND SPATIAL DIFFERENTIATION CHARACTERISTICS , 2001 .

[20]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[21]  Robert Haining,et al.  Spatial Data Analysis: Theory and Practice , 2003 .

[22]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[23]  Aleksander Øhrn,et al.  Discernibility and Rough Sets in Medicine: Tools and Applications , 2000 .

[24]  Ivo Düntsch,et al.  Statistical techniques for rough set data analysis , 2000 .

[25]  G. Bonham-Carter Geographic Information Systems for Geoscientists: Modelling with GIS , 1995 .

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[27]  Eric J. Gustafson,et al.  Change detection with heterogeneous data using ecoregional stratification, statistical summaries and a land allocation algorithm , 2005 .

[28]  Jinfeng Wang,et al.  Spatial sampling design for monitoring the area of cultivated land , 2002 .

[29]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[30]  Ronald E. McRoberts,et al.  Using satellite imagery as ancillary data for increasing the precision of estimates for the Forest Inventory and Analysis program of the USDA Forest Service , 2005 .

[31]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[32]  Gary Riley,et al.  Expert Systems: Principles and Programming , 2004 .