Knowledge Discovery in Spatial Planning Data: A Concept for Cluster Understanding

The objective of this paper is to present a methodology for discovering comprehensible, valid, potentially innovative, and useful patterns, i.e., new knowledge, in multidimensional spatial data. Techniques from statistics, machine learning, and data mining are applied in consecutive logical steps to allow the visualization of results and the application of validation procedures at each stage. However, the approach does not end with a data cluster; rather, if such a valid cluster has been achieved, then the question is posed: “What do the clusters mean?”. Symbolic machine learning methods are employed to produce an explanation of the clusters in terms of rules employing an understandable subset of the high-dimensional data variables. This combined with canonical representatives of a cluster and consideration of the spatial distribution of the clusters lead to hypothesis on emergent data structures, that is, potential new knowledge. The approach is demonstrated on an exemplary data set of German urban districts featuring seven dimensions of land use.

[1]  A. Otte,et al.  Analysing land-cover changes in relation to environmental variables in Hesse, Germany , 2004, Landscape Ecology.

[2]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[3]  Werner Bätzing,et al.  Die Typisierung der Alpengemeinden nach "Entwicklungsverlaufsklassen" für den Zeitraum 1870 - 1990 , 2001 .

[4]  Alfred Ultsch,et al.  Urban data-mining: spatiotemporal exploration of multidimensional data , 2009 .

[5]  Jiawei Han,et al.  Geographic Data Mining and Knowledge Discovery , 2001 .

[6]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[7]  Urska Demsar,et al.  Data mining of geospatial data: combining visual and automatic methods , 2006 .

[8]  Erkki Oja,et al.  Kohonen Maps , 1999, Encyclopedia of Machine Learning.

[9]  Diansheng Guo,et al.  Multivariate Spatial Clustering and Geovisualization , 2009 .

[10]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[11]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[12]  Stephen E. Fienberg,et al.  The analysis of cross-classified categorical data , 1980 .

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  G. Meinel,et al.  Land-use monitoring by topographic data analysis , 2013 .

[15]  E. Lambin,et al.  LAND USE STRATEGIES IN THE MARA ECOSYSTEM: A SPATIAL ANALYSIS LINKING SOCIO-ECONOMIC DATA WITH LANDSCAPE VARIABLES , 2002 .

[16]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[17]  Jochen A. G. Jaeger,et al.  Landscape fragmentation in Europe , 2016 .

[18]  Michael Schmidt,et al.  Spatial Planning: Indicators to Assess the Efficiency of Land Consumption and Land-use , 2008 .

[19]  A. Frenkel Land-Use Patterns in the Classification of Cities: The Israeli Case , 2004 .

[20]  Alfred Ultsch,et al.  Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series , 1999 .

[21]  Gerhard Klebe,et al.  Comparison of substructural epitopes in enzyme active sites using self-organizing maps , 2004, J. Comput. Aided Mol. Des..

[22]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[23]  Facundo Mémoli,et al.  Characterization, Stability and Convergence of Hierarchical Clustering Methods , 2010, J. Mach. Learn. Res..

[24]  Jörn Lötsch,et al.  A machine-learned knowledge discovery method for associating complex phenotypes with complex genotypes. Application to pain , 2013, J. Biomed. Informatics.

[25]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[26]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[28]  Alfred Ultsch,et al.  Pareto Density Estimation: A Density Estimation for Knowledge Discovery , 2005 .

[29]  Leland Wilkinson,et al.  The History of the Cluster Heat Map , 2009 .

[30]  Johannes Klein,et al.  ESPON Climate Climate Change and Territorial Effects on Regions and Local Economies , 2013 .

[31]  Jochen A. G. Jaeger Landscape division, splitting index, and effective mesh size: new measures of landscape fragmentation , 2000, Landscape Ecology.

[32]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[33]  Fabian Mörchen,et al.  Extracting interpretable muscle activation patterns with time series knowledge mining , 2005, Int. J. Knowl. Based Intell. Eng. Syst..

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  S. Siedentop Siedlungsstrukturelle Veränderungen im Umland der Agglomerationsräume , 2003 .

[36]  Stefan Geyler,et al.  Clusteranalyse der Gemeinden in der Kernregion Mitteldeutschland , 2014 .

[37]  Erich Tasser,et al.  Modification of the effective mesh size for measuring landscape fragmentation to solve the boundary problem , 2007, Landscape Ecology.

[38]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[39]  Christine M. Aumayr European Region Types in EU-25 , 2007 .

[40]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[41]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[42]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[43]  F. Kronthaler Economic capability of East German regions: Results of a cluster analysis , 2005 .

[44]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[45]  Gopal B. Thapa,et al.  Determinants of land-use changes in the Chittagong Hill Tracts of Bangladesh , 2004 .

[46]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[47]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[48]  Michael Schmidt,et al.  Standards and thresholds for impact assessment , 2008 .

[49]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[50]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[51]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[52]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[53]  Stefan Siedentop,et al.  Monitoring urban sprawl in Germany: towards a GIS-based measurement and assessment approach , 2010 .

[54]  B. Streich Stadtplanung in der Wissensgesellschaft , 2005 .

[55]  Ulrich Walz,et al.  Indicators of hemeroby for the monitoring of landscapes in Germany , 2014 .

[56]  D. Sack,et al.  Patterns of Social Capital in West German Regions , 2008 .