Repräsentanz und Data Mining - Konzepte und Methoden der digitalen bodenkundlichen Kartierung

Digital soil mapping for large areas is challenging if mapping resolution should be as high as possible and sampling should be as sparse as possible. Generally, the more diverse a landscape is, the more samples are required to systematically cover the entire feature space. Moreover, if soil sensing approaches like ground penetrating radar are used in a combined soil sensing and mapping approach in large areas it is important to systematically segment the landscape and to derive representative sensing sites. Segmenting a landscape as introduced in this study is the first part of a stacked sampling scheme to collect representative soil data for digital soil property mapping, developed in the Collaborative Research Centre (SFB) 299 of the German Research Foundation (DFG). It is followed by deriving representative patches and transects for linear operated soil sensing techniques. We introduce a semi-automated method to segment nominal spatial datasets based on the local spatial frequency distribution of the mapping units aiming to provide homogeneous and non-fragmented segments with smoothed boundaries. The methodological framework for segmentation comprises different spatial and non-spatial techniques and is mainly focussing on a moving window analysis of the frequency distribution and a k-means cluster analysis. Based on an existing soil map at a scale of 1:50.000 in the highly diverse Nidda catchment, Hesse, Germany, comprising an area of 1600 km2, we derived six segments and compared these with a map of landscape units (1:200.000), comprising eight main landscape units within the catchment. Comparisons with respect to the distribution of soils and parent materials reveal that the proposed approach returns spatial segments with a higher homogeneity in terms of feature space. Similar results were obtained by analyzing the feature spaces of different terrain attributes. Landschaftssegmentierung 42 As segmentation is based on a soil map, soilscapes are derived. These can not only be used for sampling purposes, but are of importance for a variety of environmental issues such as biodiversity and ecosystem analyses or characterization of hydrological units.

[1]  Yoshihiko Hamamoto,et al.  A Bootstrap Technique for Nearest Neighbor Classifier Design , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Diansheng Guo,et al.  GEOSPATIAL DATA MING AND KNOWLEDGE DISCOVERY USING DECISION TREE ALGORITHM—A CASE STUDY OF SOIL DATA SET OF THE YELLOW RIVER DELTA , 1999 .

[3]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[4]  Paul E. Gessler,et al.  Towards a New Framework for Modeling the Soil‐Landscape Continuum , 1994 .

[5]  Dominique Arrouays,et al.  Extrapolating regional soil landscapes from an existing soil map: Sampling intensity, validation procedures, and integration of spatial context , 2008 .

[6]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[7]  Thorsten Behrens,et al.  Digital soil mapping using artificial neural networks , 2005 .

[8]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[9]  M. A. Green,et al.  A method for identifying soil catenas and determining map unit composition used in a reconnaissance soil survey in Tanzania , 1993 .

[10]  Wanglu Peng,et al.  Delineating patterns of soil drainage class on bare soils using remote sensing analyses , 2003 .

[11]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[12]  Huan Liu,et al.  Sampling: Knowing Whole from Its Part , 2001 .

[13]  Livier,et al.  REMOTE SENSING CLASSIFICATION OF SPECTRAL , SPATIAL AND CONTEXTUAL DATA USING MULTIPLE CLASSIFIER SYSTEMS , 2001 .

[14]  Tsunenori Ishioka,et al.  Evaluation of criteria for information retrieval , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[15]  Huan Liu,et al.  Data Reduction via Instance Selection , 2001 .

[16]  F. Ziadat,et al.  Land suitability classification using different sources of information: Soil maps and predicted soil attributes in Jordan , 2007 .

[17]  Lyman L. McDonald,et al.  Size Bias in Line Transect Sampling , 1987 .

[18]  Gerard B. M. Heuvelink,et al.  Error Propagation in Environmental Modelling with GIS , 1998 .

[19]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[20]  Ethem Alpaydin,et al.  Voting over Multiple Condensed Nearest Neighbors , 1997, Artificial Intelligence Review.

[21]  L. Boruvka,et al.  The Digital Terrain Model as a Tool for Improved Delineation of Alluvial Soils , 2008 .

[22]  José F. Moreno,et al.  CART-based feature selection of hyperspectral images for crop cover classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[23]  Christoph F. Eick,et al.  Dataset Editing Techniques : A Comparative Study , 2005 .

[24]  P. Scull,et al.  The application of classification tree analysis to soil type prediction in a desert landscape , 2005 .

[25]  Andrew K. Skidmore,et al.  Remote sensing of soils in a eucalypt forest environment. , 1997 .

[26]  D. J. Brus,et al.  Incorporating models of spatial variation in sampling strategies for soil , 1993 .

[27]  Jeroen M. Schoorl,et al.  Mapping hydrological pathways of phosphorus transfer in apparently homogeneous landscapes using a high - resolution DEM , 2006 .

[28]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[29]  H. Elsenbeer,et al.  Soil organic carbon concentrations and stocks on Barro Colorado Island — Digital soil mapping using Random Forests analysis , 2008 .

[30]  Jacques-Eric Bergez,et al.  A hierarchical partitioning method for optimizing irrigation strategies , 2004 .

[31]  Philippe Lagacherie,et al.  Mapping of reference area representativity using a mathematical soilscape distance , 2001 .

[32]  Yaochu Jin,et al.  Multi-Objective Machine Learning , 2006, Studies in Computational Intelligence.

[33]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[34]  P. Burrough,et al.  Principles of geographical information systems , 1998 .

[35]  A-Xing Zhu,et al.  The ConMap approach for terrain-based digital soil mapping , 2010 .

[36]  D. P. Shrestha,et al.  Modelling land degradation in the Nepalese Himalaya , 2004 .

[37]  Chris Moran,et al.  A strategy to fill gaps in soil survey over large spatial extents: an example from the Murray-Darling basin of Australia , 2003 .

[38]  Peter Scull,et al.  Predictive soil mapping: a review , 2003 .

[39]  Pamela C. Cosman,et al.  Automatic tracking, feature extraction and classification of C. elegans phenotypes , 2004, IEEE Transactions on Biomedical Engineering.

[40]  S. K. Jenson,et al.  Extracting topographic structure from digital elevation data for geographic information-system analysis , 1988 .

[41]  Budiman Minasny,et al.  Estimation and potential improvement of the quality of legacy soil samples for digital soil mapping , 2007 .

[42]  G. Matheron Principles of geostatistics , 1963 .

[43]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[44]  Paul E. Gessler,et al.  Soil-Landscape Modelling and Spatial Prediction of Soil Attributes , 1995, Int. J. Geogr. Inf. Sci..

[45]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[46]  Budiman Minasny,et al.  On digital soil mapping , 2003 .

[47]  E. Giasson,et al.  Assessing the economic value of soil information using decision analysis techniques , 2000 .

[48]  Randall K. Kolka,et al.  Soil carbon storage estimation in a forested watershed using quantitative soil-landscape modeling. , 2005 .

[49]  M. von Oppen,et al.  Atlas of natural and agronomic resources in south-west Niger and south Benin: an open and developing tool for presentation of scientific results. , 2000 .

[50]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[51]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[52]  Le Moigne,et al.  République du Niger , 1974 .

[53]  D. Sauer,et al.  Application of ground‐penetrating radar to determine the thickness of Pleistocene periglacial slope deposits , 2004 .

[54]  T. M. Bushnell,et al.  Some Aspects of the Soil Catena Concept , 1943 .

[55]  R. Webster,et al.  Statistical Methods in Soil and Land Resource Survey. , 1990 .

[56]  Abraham P. Punnen,et al.  The traveling salesman problem and its variations , 2007 .

[57]  Alain Monfort,et al.  General concepts, estimation, prediction, and algorithms , 1995 .

[58]  Alex B. McBratney,et al.  Multivariate calibration of hyperspectral γ‐ray energy spectra for proximal soil sensing , 2007 .

[59]  A. Veldkamp,et al.  Advances in landscape - scale soil research , 2006 .

[60]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[61]  Wai Lam,et al.  Learning via Prototype Generation and Filtering , 2001 .

[62]  Jay C. Bell,et al.  Digital elevation model resolution: effects on terrain attribute calculation and quantitative soil-landscape modeling , 2001 .

[63]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[64]  R. V. Ruhe,et al.  GEOMORPHIC SURFACES AND THE NATURE OF SOILS , 1956 .

[65]  Elisabeth N. Bui,et al.  Soil survey as a knowledge system , 2004 .

[66]  Michael J. Singer,et al.  Factors of soil formation : a fiftieth anniversary retrospective : proceedings of a symposium cosponsored by the Council on the History of Soil Science (S205.1) and Division S-5 of the Soil Science Society of America , 1994 .

[67]  Thorsten Behrens,et al.  Instance selection and classification tree analysis for large spatial datasets in digital soil mapping , 2008 .

[68]  Violette Geissen,et al.  Superficial and subterranean soil erosion in Tabasco, tropical Mexico : Development of a decision tree modeling approach , 2007 .

[69]  Robert P. W. Duin,et al.  Bagging and the Random Subspace Method for Redundant Feature Spaces , 2001, Multiple Classifier Systems.

[70]  Karl Auerswald,et al.  Using relief parameters in a discriminant analysis to stratify geological areas with different spatial variability of soil properties , 1999 .

[71]  Thorsten Behrens,et al.  An Approach to Removing Uncertainties in Nominal Environmental Covariates and Soil Class Maps , 2008 .

[72]  Thomas Scholten,et al.  Pedogenesis, permafrost, and soil moisture as controlling factors for soil nitrogen and carbon contents across the Tibetan Plateau , 2009 .

[73]  Aleš Fajgelj,et al.  Terminology in soil sampling (IUPAC Recommendations 2005) , 2005 .

[74]  Gary A. Peterson,et al.  Soil Attribute Prediction Using Terrain Analysis , 1993 .

[75]  Elisabeth N. Bui,et al.  Extracting soil-landscape rules from previous soil surveys , 1999 .

[76]  Alex B. McBratney,et al.  An overview of pedometric techniques for use in soil survey , 2000 .

[77]  P. van Beek,et al.  Designing efficient soil survey schemes with a knowledge-based system using dynamic programming , 1997 .

[78]  Anne Gobin,et al.  Integrated toposequence analyses to combine local and scientific knowledge systems , 2000 .

[79]  A-Xing Zhu,et al.  Multi-scale digital terrain analysis and feature selection for digital soil mapping , 2010 .

[80]  R. V. Rossel,et al.  Spectral soil analysis and inference systems : A powerful combination for solving the soil data crisis , 2006 .

[81]  A. H. Thiessen PRECIPITATION AVERAGES FOR LARGE AREAS , 1911 .

[82]  Wei-Yin Loh,et al.  Application of box-cox transformations to discrimination for the two-class problem , 1992 .

[83]  Budiman Minasny,et al.  Uncertainty analysis for soil‐terrain models , 2006, Int. J. Geogr. Inf. Sci..

[84]  C. Brodley,et al.  Decision tree classification of land cover from remotely sensed data , 1997 .

[85]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  Kai-Tai Fang,et al.  The Classification Tree Combined with SIR and Its Applications to Classification of Mass Spectra , 2003, Journal of Data Science.

[87]  Paul L. G. Vlek,et al.  Environmental correlation of three-dimensional soil spatial variability: a comparison of three adaptive techniques , 2002 .

[88]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[89]  G. MILNE,et al.  Normal Erosion as a Factor in Soil Profile Development , 1936, Nature.

[90]  R. DeFries,et al.  Classification trees: an alternative to traditional land cover classifiers , 1996 .

[91]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[92]  H. Altay Güvenir,et al.  K Nearest Neighbor Classification on Feature Projections , 1996, ICML.

[93]  A. Zhu Mapping soil landscape as spatial continua: The Neural Network Approach , 2000 .

[94]  D. L. Massart,et al.  Characterisation of the representativity of selected sets of samples in multivariate calibration and pattern recognition , 1997 .

[95]  Thorsten Behrens,et al.  Analysis on pedodiversity and spatial subset representativity—the German soil map 1:1,000,000 , 2009 .

[96]  Massimiliano Pontil,et al.  A Simple Algorithm for Learning Stable Machines , 2002, ECAI.

[97]  D. Stoyan,et al.  A Remark on the Line Transect Method , 1982 .

[98]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[99]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[100]  Donald G. Bullock,et al.  Quantitative Mapping of Soil Drainage Classes Using Topographical Data and Soil Electrical Conductivity , 2002 .

[101]  Philippe Lagacherie,et al.  A soil survey procedure using the knowledge of soil pattern established on a previously mapped reference area , 1995 .

[102]  Samy Bengio,et al.  Local Machine Learning Models for Spatial Data Analysis , 2000 .

[103]  Budiman Minasny,et al.  From pedotransfer functions to soil inference systems , 2002 .

[104]  Bin Zhou,et al.  Automated soil resources mapping based on decision tree and Bayesian predictive modeling , 2004, Journal of Zhejiang University. Science.

[105]  Winfried Schröder,et al.  Soil monitoring in Germany , 2004 .

[106]  Francis D. Hole,et al.  An approach to landscape analysis with emphasis on soils , 1978 .

[107]  Thorsten Behrens,et al.  Chapter 25 A Comparison of Data-Mining Techniques in Predictive Soil Mapping , 2006 .

[108]  Budiman Minasny,et al.  Mechanistic soil–landscape modelling as an approach to developing pedogenetic classifications , 2006 .

[109]  E. Schlichting,et al.  Archetypes of catenas in respect to matter — a concept for structuring and grouping catenas , 1997 .

[110]  Neil McKenzie,et al.  Integrating forest soils information across scales: spatial prediction of soil properties under Australian forests. , 2000 .

[111]  Hyunjoong Kim,et al.  Classification Trees With Unbiased Multiway Splits , 2001 .

[112]  SearchKatherine Bennett Ensor,et al.  Stochastic Optimization via Grid , 2008 .

[113]  Simon Scherrer,et al.  A PROCEDURE FOR THE IDENTIFICATION OF DOMINANT RUNOFF PROCESSES BY FIELD INVESTIGATIONS TO DELINEATE THE RELEVANT CONTRIBUTING AREAS FOR FLOOD MOD- ELLING , 2007 .

[114]  A. Skidmore,et al.  An Operational GIS Expert System for Mapping Forest Soils , 1996 .

[115]  D. J. Brus,et al.  Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with discussion) , 1997 .

[116]  Philippe Lagacherie,et al.  Addressing Geographical Data Errors in a Classification Tree for Soil Unit Prediction , 1997, Int. J. Geogr. Inf. Sci..

[117]  Chenghu Zhou,et al.  Purposive Sampling for Digital Soil Mapping for Areas with Limited Data , 2008 .

[118]  Elisabeth N. Bui,et al.  Spatial data mining for enhanced soil map modelling , 2002, Int. J. Geogr. Inf. Sci..

[119]  H. Jenny Factors of Soil Formation: A System of Quantitative Pedology , 2011 .

[120]  G. Metternicht,et al.  Testing the performance of spatial interpolation techniques for mapping soil properties , 2006 .

[121]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[122]  Thorsten Behrens,et al.  Digital soil mapping in Germany—a review , 2006 .

[123]  David R. Anderson,et al.  Estimation of Density from Line Transect Sampling of Biological Populations. , 1982 .