Many organizations have large quantities of spatial data collected in various application areas, including remote sensing, geographical information systems (GIS), astronomy, computer cartography, environmental assessment and planning, etc. These data collections are growing rapidly and can therefore be considered as spatial data streams. For data stream classification, time is a major issue. However, these spatial data sets are too large to be classified effectively in a reasonable amount of time using existing methods. In this paper, we developed a new method for decision tree classification on spatial data streams using a data structure called Peano Count Tree (P-tree). The Peano Count Tree is a spatial data organization that provides a lossless compressed representation of a spatial data set and facilitates efficient classification and other data mining techniques. Using P-tree structure, fast calculation of measurements, such as information gain, can be achieved. We compare P-tree based decision tree induction classification and a classical decision tree induction method with respect to the speed at which the classifier can be built (and rebuilt when substantial amounts of new data arrive). Experimental results show that the P-tree method is significantly faster than existing classification methods, making it the preferred method for mining on spatial data streams.
[1]
Ronald L. Rivest,et al.
Inferring Decision Trees Using the Minimum Description Length Principle
,
1989,
Inf. Comput..
[2]
Tomasz Imielinski,et al.
An Interval Classifier for Database Mining Applications
,
1992,
VLDB.
[3]
Jiawei Han,et al.
Data Mining: Concepts and Techniques
,
2000
.
[4]
Qiang Ding,et al.
Deriving High Confidence Rules from Spatial Data Using Peano Count Trees
,
2001,
WAIM.
[5]
J. Ross Quinlan,et al.
C4.5: Programs for Machine Learning
,
1992
.
[6]
David J. Spiegelhalter,et al.
Machine Learning, Neural and Statistical Classification
,
2009
.
[7]
Rakesh Agrawal,et al.
SPRINT: A Scalable Parallel Classifier for Data Mining
,
1996,
VLDB.
[8]
Casimir A. Kulikowski,et al.
Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems
,
1990
.
[9]
Geoff Hulten,et al.
Catching up with the Data: Research Issues in Mining Data Streams
,
2001,
DMKD.