A Spatial Entropy‐Based Decision Tree for Classification of Geographical Information

A decision tree is a classification algorithm that automatically derives a hierarchy of partition rules with respect to a target attribute of a large dataset. However, spatial autocorrelation makes conventional decision trees underperform for geographical datasets as the spatial distribution is not taken into account. The research presented in this paper introduces the concept of a spatial decision tree based on a spatial diversity coefficient that measures the spatial entropy of a geo-referenced dataset. The principle of this solution is to take into account the spatial autocorrelation phenomena in the classification process, within a notion of spatial entropy that extends the conventional notion of entropy. Such a spatial entropy-based decision tree integrates the spatial autocorrelation component and generates a classification process adapted to geographical data. A case study oriented to the classification of an agriculture dataset in China illustrates the potential of the proposed approach.

[1]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  Andreas Wierse,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[4]  J. Reynolds,et al.  A new contagion index to quantify spatial patterns of landscapes , 1993, Landscape Ecology.

[5]  Bruce T. Milne,et al.  Indices of landscape pattern , 1988, Landscape Ecology.

[6]  Qiang Ding,et al.  Decision tree classification of spatial data streams using Peano Count Trees , 2002, SAC '02.

[7]  Hans-Peter Kriegel,et al.  Spatial Data Mining: A Database Approach , 1997, SSD.

[8]  Nadjim Chelghoum,et al.  Spatial decision tree-application to traffic risk analysis , 2001, Proceedings ACS/IEEE International Conference on Computer Systems and Applications.

[9]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[10]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[11]  Yelena Yesha,et al.  Data Mining: Next Generation Challenges and Future Directions , 2004 .

[12]  Shashi Shekhar,et al.  Book chapter in data mining: Next generation chal-lenges and future directions , 2003 .

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  James F. Baldwin,et al.  Learning Rules for Odour Recognition in an Electronic Nose , 2003, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[16]  Jiawei Han,et al.  An Efficient Two-Step Method for Classification of Spatial Data , 1998 .

[17]  J. R. Quinlan Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[18]  Robert C. Balling,et al.  A spatial entropy analysis of temperature trends in the United States , 2004 .

[19]  Peigen Li,et al.  Application of ID3 algorithm in knowledge acquisition for tolerance design , 2001 .

[20]  Sukumar Chakraborty,et al.  Fuzzy rule extraction from ID3-type decision trees for real data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[21]  Ranga Raju Vatsavai,et al.  Trends in Spatial Data Mining , 2022 .

[22]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[23]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[24]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[25]  Christophe Claramunt,et al.  A Spatial Form of Diversity , 2005, COSIT.

[26]  Donald Michie,et al.  Expert systems in the micro-electronic age , 1979 .