Mining changing regions from access-constrained snapshots: a cluster-embedded decision tree approach

Change detection on spatial data is important in many applications, such as environmental monitoring. Given a set of snapshots of spatial objects at various temporal instants, a user may want to derive the changing regions between any two snapshots. Most of the existing methods have to use at least one of the original data sets to detect changing regions. However, in some important applications, due to data access constraints such as privacy concerns and limited data online availability, original data may not be available for change analysis. In this paper, we tackle the problem by proposing a simple yet effective model-based approach. In the model construction phase, data snapshots are summarized using the novel cluster-embedded decision trees as concise models. Once the models are built, the original data snapshots will not be accessed anymore. In the change detection phase, to mine changing regions between any two instants, we compare the two corresponding cluster-embedded decision trees. Our systematic experimental results on both real and synthetic data sets show that our approach can detect changes accurately and effectively.

[1]  Giuseppe Psaila,et al.  Active Data Mining , 1995, Encyclopedia of GIS.

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  R. Webster,et al.  Kriging: a method of interpolation for geographical information systems , 1990, Int. J. Geogr. Inf. Sci..

[4]  S. Ruggles Integrated Public Use Microdata Series , 2021, Encyclopedia of Gerontology and Population Aging.

[5]  David Heckerman,et al.  Bayesian Networks for Knowledge Discovery , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Heikki Mannila,et al.  Distance measures for point sets and their computation , 1997, Acta Informatica.

[8]  Jeffrey Xu Yu,et al.  Mining Changes of Classification by Correspondence Tracing , 2003, SDM.

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[12]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[13]  Johannes Gehrke,et al.  A framework for measuring changes in data characteristics , 1999, PODS '99.

[14]  Wynne Hsu,et al.  Mining Changes for Real-Life Applications , 2000, DaWaK.

[15]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[21]  Wynne Hsu,et al.  Discovering the set of fundamental rule changes , 2001, KDD '01.

[22]  Günter Rote,et al.  Computing the Minimum Hausdorff Distance Between Two Point Sets on a Line Under Translation , 1991, Inf. Process. Lett..

[23]  Athanasios Papoulis,et al.  Probability, random variables, and stochastic processes , 2002 .

[24]  H. Saunders,et al.  Probability, Random Variables and Stochastic Processes (2nd Edition) , 1989 .

[25]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[26]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[27]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[28]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[29]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[30]  Steven Ruggles,et al.  Integrated Public Use Microdata Series: Version 3 , 2003 .

[31]  Edwin Diday,et al.  An introduction to symbolic data analysis and the SODAS software , 2003, Intell. Data Anal..

[32]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .