Sequential clustering with particle filters-estimating the number of clusters from data

In this paper we develop a particle filtering approach for grouping observations into an unspecified number of clusters. Each cluster corresponds to a potential target from which the observations originate. A potential clustering with a specified number of clusters is represented by an association hypothesis. Whenever a new report arrives, a posterior distribution over all hypotheses is iteratively calculated from a prior distribution, an update model and a likelihood function. The update model is based on an association probability for clusters given the probability of false detection and a derived probability of an unobserved target. The likelihood of each hypothesis is derived from a cost value of associating the current report with its corresponding cluster according to the hypothesis. A set of hypotheses is maintained by Monte Carlo sampling. In this case, the state-space, i.e., the space of all hypotheses, is discrete with a linearly growing dimensionality over time. To lower the complexity further, hypotheses are combined if their clusters are close to each other in the observation space. Finally, for each time-step, the posterior distribution is projected into a distribution over the number of clusters. Compared to earlier information theoretic approaches for finding the number of clusters this approach does not require a large number of trial clusterings, since it maintains an estimate of the number of clusters along with the cluster configuration.

[1]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  A. W. Rosenbluth,et al.  MONTE CARLO CALCULATION OF THE AVERAGE EXTENSION OF MOLECULAR CHAINS , 1955 .

[3]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[4]  F. Y. Wu The Potts model , 1982 .

[5]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[6]  Carsten Peterson,et al.  A New Method for Mapping Optimization Problems Onto Neural Networks , 1989, Int. J. Neural Syst..

[7]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[8]  Johan Schubert On nonspecific evidence , 1993, Int. J. Intell. Syst..

[9]  Johan Schubert,et al.  Simultaneous Dempster-Shafer clustering and gradual determination of number of clusters using a neural network structure , 2003, 1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251).

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Mats Bengtsson,et al.  Dempster–Shafer clustering using Potts spin mean field theory , 2001, Soft Comput..

[12]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[13]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[14]  Johan Schubert Clustering belief functions based on attracting and conflicting metalevel evidence , 2003, ArXiv.

[15]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[16]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[17]  Johan Schubert,et al.  Clustering belief functions based on attracting and conflicting metalevel evidence using Potts spin mean field theory , 2003, Inf. Fusion.

[18]  Simon Ahlberg,et al.  The IFD03 Information Fusion Demonstrator , 2004 .

[19]  Pontus Svenson,et al.  Methods and System Design of the IFD03 Information Fusion Demonstrator , 2004 .