Feature allocations, probability functions, and paintboxes

The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an “exchangeable feature probability function” (EFPF)—analogous to the EPPF in the clustering setting—for certain types of feature models. Moreover, we introduce a “feature paintbox” characterization—analogous to the Kingman paintbox for clustering—of the class of exchangeable feature models. We provide a further characterization of the subclass of feature allocations that have EFPF representations.

[1]  Michael I. Jordan,et al.  Cluster and Feature Modeling from Combinatorial Stochastic Processes , 2012, 1206.5862.

[2]  David B. Dunson,et al.  Beta-Negative Binomial Process and Poisson Factor Analysis , 2011, AISTATS.

[3]  Michael I. Jordan,et al.  Combinatorial Clustering and the Beta Negative Binomial Process , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Michael I. Jordan,et al.  Beta Processes, Stick-Breaking and Power Laws , 2011, 1106.0539.

[5]  Y. Teh,et al.  Indian Buffet Processes with Power-law Behavior , 2009, NIPS.

[6]  Mokshay M. Madiman,et al.  Log-concavity, ultra-log-concavity, and a maximum entropy property of discrete compound Poisson measures , 2009, Discret. Appl. Math..

[7]  Yee Whye Teh,et al.  Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[8]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[9]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[10]  Yongdai Kim NONPARAMETRIC BAYESIAN ESTIMATORS FOR COUNTING PROCESSES , 1999 .

[11]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[12]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[13]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[14]  J. Kingman The Representation of Partition Structures , 1978 .

[15]  S. M. Samuels On the Number of Successes in Independent Trials , 1965 .

[16]  L. J. Savage,et al.  Symmetric measures on Cartesian products , 1955 .

[17]  Tamara Broderick,et al.  Clusters and Features from Combinatorial Stochastic Processes , 2014 .

[18]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[19]  Jun S. Liu,et al.  STATISTICAL APPLICATIONS OF THE POISSON-BINOMIAL AND CONDITIONAL BERNOULLI DISTRIBUTIONS , 1997 .

[20]  D. Aldous Exchangeability and related topics , 1985 .

[21]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[22]  B. De Finetti,et al.  Funzione caratteristica di un fenomeno aleatorio , 1929 .