TOBAE: A Density-based Agglomerative Clustering Algorithm

This paper presents a novel density based agglomerative clustering algorithm named TOBAE which is a parameter-less algorithm and automatically filters noise. It finds the appropriate number of clusters while giving a competitive running time. TOBAE works by tracking the cumulative density distribution of the data points on a grid and only requires the original data set as input. The clustering problem is solved by automatically finding the optimal density threshold for the clusters. It is applicable to any N-dimensional data set which makes it highly relevant for real world scenarios. The algorithm outperforms state of the art clustering algorithms by the additional feature of automatic noise filtration around clusters. The concept behind the algorithm is explained using the analogy of puddles (’tobae’), which the algorithm is inspired from. This paper provides a detailed algorithm for TOBAE along with the complexity analysis for both time and space. We show experimental results against known data sets and show how TOBAE competes with the best algorithms in the field while providing its own set of advantages.

[1]  Gareth J. Janacek,et al.  Clustering time series from ARMA models with clipped data , 2004, KDD.

[2]  Andrew Hunter,et al.  Application of the self-organising map to trajectory classification , 2000, Proceedings Third IEEE International Workshop on Visual Surveillance.

[3]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  S. Khalid,et al.  Automatic Motion Learning in the Presence of Anomalies Using Coefficient Feature Space Representatio , 2010 .

[5]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Brian Everitt,et al.  Cluster analysis , 1974 .

[7]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[8]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[9]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[10]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[12]  Shehzad Khalid,et al.  Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space , 2006, Multimedia Systems.

[13]  B. Frey,et al.  Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs , 2005, Nature Genetics.

[14]  B. Everitt,et al.  Cluster Analysis: Low Temperatures and Voting in Congress , 2001 .

[15]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[16]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[17]  Eamonn J. Keogh,et al.  LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures , 2006, VLDB.

[18]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[19]  Shehzad Khalid,et al.  Motion-based behaviour learning, profiling and classification in the presence of anomalies , 2010, Pattern Recognit..

[20]  Shehzad Khalid,et al.  Frameworks for multivariate m-mediods based modeling and classification in Euclidean and general feature spaces , 2012, Pattern Recognit..

[21]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[22]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Brendan J. Frey,et al.  Solving the Uncapacitated Facility Location Problem Using Message Passing Algorithms , 2010, AISTATS.

[24]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[25]  Panu Somervuo,et al.  Self-Organizing Maps and Learning Vector Quantization for Feature Sequences , 1999, Neural Processing Letters.

[26]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[27]  Fatih Murat Porikli,et al.  Event Detection by Eigenvector Decomposition Using Object and Frame Features , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[28]  Ronald R. Yager Intelligent control of the hierarchical agglomerative clustering process , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[29]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[30]  Dan Schonfeld,et al.  Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models , 2007, IEEE Transactions on Image Processing.

[31]  S. Sclaroff,et al.  Extraction and clustering of motion trajectories in video , 2004, ICPR 2004.

[32]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[33]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[34]  Brendan J. Frey,et al.  Constructing Treatment Portfolios Using Affinity Propagation , 2008, RECOMB.

[35]  Dan Schonfeld,et al.  Real-Time Motion Trajectory-Based Indexing and Retrieval of Video Sequences , 2007, IEEE Transactions on Multimedia.

[36]  Shehzad Khalid,et al.  Activity classification and anomaly detection using m-mediods based modelling of motion patterns , 2010, Pattern Recognit..

[37]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[38]  C. Abraham,et al.  Unsupervised Curve Clustering using B‐Splines , 2003 .

[39]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[40]  Andrew J. Bulpitt,et al.  Learning spatio-temporal patterns for predicting object behaviour , 2000, Image Vis. Comput..

[41]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[42]  Tao Guo,et al.  Adaptive Affinity Propagation Clustering , 2008, ArXiv.

[43]  Fabrice Rossi,et al.  Multi-layer Perceptrons for Functional Data Analysis: A Projection Based Approach , 2002, ICANN.

[44]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[45]  George Karypis,et al.  A segment-based approach to clustering multi-topic documents , 2012, Knowledge and Information Systems.

[46]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[47]  A. Bagnall,et al.  Clustering Time Series from Mixture Polynomial Models with Discretised Data , 2003 .

[48]  Brendan J. Frey,et al.  FLoSS: Facility location for subspace segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[49]  W. Stuetzle,et al.  A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density , 2010 .