A random finite set model for data clustering

The goal of data clustering is to partition data points into groups to optimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data.

[1]  Arnaud Doucet,et al.  On the conditional distributions of spatial point processes , 2011, Advances in Applied Probability.

[2]  Y. Ogata Seismicity Analysis through Point-process Modeling: A Review , 1999 .

[3]  M. R. Leadbetter Poisson Processes , 2011, International Encyclopedia of Statistical Science.

[4]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[5]  A. Baddeley,et al.  Stochastic geometry models in high-level vision , 1993 .

[6]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  François Baccelli,et al.  Stochastic Geometry and Wireless Networks, Volume 1: Theory , 2009, Found. Trends Netw..

[9]  van Marie-Colette Lieshout,et al.  Markov Point Processes and Their Applications , 2000 .

[10]  Karl J. Friston,et al.  Hierarchical Models , 2003 .

[11]  Ulrike Goldschmidt,et al.  An Introduction To The Theory Of Point Processes , 2016 .

[12]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[13]  Ba Tuong Vo,et al.  Random finite sets in Multi-object filtering , 2008 .

[14]  Jeffrey G. Andrews,et al.  Stochastic geometry and random graphs for the analysis and design of wireless networks , 2009, IEEE Journal on Selected Areas in Communications.

[15]  Martin Haenggi,et al.  On distances in uniformly random networks , 2005, IEEE Transactions on Information Theory.

[16]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[17]  J. Møller,et al.  Statistical Inference and Simulation for Spatial Point Processes , 2003 .

[18]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[19]  T. Mattfeldt Stochastic Geometry and Its Applications , 1996 .

[20]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[21]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[22]  David Vere-Jones,et al.  Point Processes , 2011, International Encyclopedia of Statistical Science.

[23]  Ba-Ngu Vo,et al.  Filters for Spatial Point Processes , 2009, SIAM J. Control. Optim..

[24]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[25]  D. Stoyan,et al.  Recent applications of point process methods in forestry statistics , 2000 .

[26]  BaccelliFrançois,et al.  Stochastic geometry and random graphs for the analysis and design of wireless networks , 2009 .

[27]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[28]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[29]  D. Stoyan,et al.  Stochastic Geometry and Its Applications , 1989 .

[30]  Ronald P. S. Mahler,et al.  Statistical Multisource-Multitarget Information Fusion , 2007 .

[31]  A. Doucet,et al.  Sequential Monte Carlo methods for multitarget filtering with random finite sets , 2005, IEEE Transactions on Aerospace and Electronic Systems.

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  Ba-Ngu Vo,et al.  A Random-Finite-Set Approach to Bayesian SLAM , 2011, IEEE Transactions on Robotics.

[34]  R. Mahler Multitarget Bayes filtering via first-order multitarget moments , 2003 .

[35]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[36]  T. Berger,et al.  General methodology for nonlinear modeling of neural systems with Poisson point-process inputs. , 2005, Mathematical biosciences.

[37]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[38]  Michael I. Jordan Hierarchical Models , Nested Models and Completely Random Measures , 2010 .

[39]  W. Eric L. Grimson,et al.  Construction of Dependent Dirichlet Processes based on Poisson Processes , 2010, NIPS.

[40]  D. Blei Bayesian Nonparametrics I , 2016 .

[41]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .