A Clustering-Based Data Reduction for Very Large Spatio-Temporal Datasets

Today, huge amounts of data are being collected with spatial and temporal components from sources such as meteorological, satellite imagery etc. Efficient visualisation as well as discovery of useful knowledge from these datasets is therefore very challenging and becoming a massive economic need. Data Mining has emerged as the technology to discover hidden knowledge in very large amounts of data. Furthermore, data mining techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. As a consequence, instead of dealing with a large size of raw data, we can use these representatives to visualise or to analyse without losing important information. This paper presents a new approach based on different clustering techniques for data reduction to help analyse very large spatio-temporal data. We also present and discuss preliminary results of this approach.

[1]  Huan Liu,et al.  On Issues of Instance Selection , 2002, Data Mining and Knowledge Discovery.

[2]  M. Tahar Kechadi,et al.  Data Reduction in Very Large Spatio-Temporal Datasets , 2010, 2010 19th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises.

[3]  Heikki Mannila,et al.  The power of sampling in knowledge discovery , 1994, PODS '94.

[4]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[5]  Fabrizio Angiulli,et al.  Fast condensed nearest neighbor rule , 2005, ICML.

[6]  Michela Bertolotto,et al.  Exploratory spatio-temporal data mining and visualization , 2007, J. Vis. Lang. Comput..

[7]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[8]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[9]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[10]  Michela Bertolotto,et al.  Towards a framework for mining and analysing spatio‐temporal datasets , 2007, Int. J. Geogr. Inf. Sci..

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[13]  Khalid Sayood,et al.  Introduction to data compression (2nd ed.) , 2000 .

[14]  Download Book,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[15]  Hans-Peter Kriegel,et al.  DBDC: Density Based Distributed Clustering , 2004, EDBT.

[16]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[17]  Nong Ye,et al.  The Handbook of Data Mining , 2003 .

[18]  John F. Roddick,et al.  An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research , 2000, TSDM.

[19]  Sushil Jajodia,et al.  Time Granularities in Databases, Data Mining, and Temporal Reasoning , 2000, Springer Berlin Heidelberg.

[20]  Curtis E. Dyreson,et al.  A Glossary of Time Granularity Concepts , 1997, Temporal Databases, Dagstuhl.

[21]  R. Jarvis,et al.  ClusteringUsing a Similarity Measure Based on SharedNear Neighbors , 1973 .

[22]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[23]  Gennady L. Andrienko,et al.  Exploratory spatio-temporal visualization: an analytical review , 2003, J. Vis. Lang. Comput..

[24]  John F. Roddick,et al.  Temporal, Spatial, and Spatio-Temporal Data Mining , 2001, Lecture Notes in Computer Science.

[25]  John F. Roddick,et al.  Paradigms for Spatial and Spatio-Temporal Data Mining , 2001 .

[26]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.