Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics

Integration of several types of data is a burgeoning field. Some data naturally lead to formal models; others may convey proximity among observations. Clustering methods are typically either model-based or distanced-based, but not both. We propose a method that is simultaneously model-based and distance-based, permitting the use of both types of data. We show the Dirichlet process induces a clustering distribution in which the probability that an item is clustered with another item is uniform across all items. We provide an extension which incorporates distance information to provide a probability distribution for partitions that is indexed by pairwise distances between items. We show how to utilize this new distance-based probability distribution over partitions as a prior clustering distribution in Bayesian nonparametric models. We show an application to ion mobility-mass spectrometry.

[1]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[2]  P. Müller,et al.  10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[3]  K. P. Lennox,et al.  Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics , 2009, Journal of the American Statistical Association.

[4]  D. Binder Bayesian cluster analysis , 1978 .

[5]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[6]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[7]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[8]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[9]  J. Hartigan,et al.  Product Partition Models for Change Point Problems , 1992 .

[10]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[11]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[12]  Harshinder Singh,et al.  Probabilistic model for two dependent circular variables , 2002 .

[13]  Ramsés H. Mena,et al.  Hierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors , 2005 .

[14]  P. Müller,et al.  Random Partition Models with Regression on Covariates. , 2010, Journal of statistical planning and inference.

[15]  L. Hubert,et al.  Comparing partitions , 1985 .

[16]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[17]  F. Quintana A predictive view of Bayesian clustering , 2006 .