Feature allocations, probability functions, and paintboxes

The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an “exchangeable feature probability function” (EFPF)—analogous to the EPPF in the clustering setting—for certain types of feature models. Moreover, we introduce a “feature paintbox” characterization—analogous to the Kingman paintbox for clustering—of the class of exchangeable feature models. We provide a further characterization of the subclass of feature allocations that have EFPF representations.

[1]  B. De Finetti,et al.  Funzione caratteristica di un fenomeno aleatorio , 1929 .

[2]  L. J. Savage,et al.  Symmetric measures on Cartesian products , 1955 .

[3]  S. M. Samuels On the Number of Successes in Independent Trials , 1965 .

[4]  J. Kingman The Representation of Partition Structures , 1978 .

[5]  D. Aldous Exchangeability and related topics , 1985 .

[6]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[7]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[8]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[9]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[10]  Jun S. Liu,et al.  STATISTICAL APPLICATIONS OF THE POISSON-BINOMIAL AND CONDITIONAL BERNOULLI DISTRIBUTIONS , 1997 .

[11]  Yongdai Kim NONPARAMETRIC BAYESIAN ESTIMATORS FOR COUNTING PROCESSES , 1999 .

[12]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[13]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[14]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[15]  Yee Whye Teh,et al.  Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[16]  Y. Teh,et al.  Indian Buffet Processes with Power-law Behavior , 2009, NIPS.

[17]  Michael I. Jordan,et al.  Beta Processes, Stick-Breaking and Power Laws , 2011, 1106.0539.

[18]  David B. Dunson,et al.  Beta-Negative Binomial Process and Poisson Factor Analysis , 2011, AISTATS.

[19]  Michael I. Jordan,et al.  Cluster and Feature Modeling from Combinatorial Stochastic Processes , 2012, 1206.5862.

[20]  Mokshay M. Madiman,et al.  Log-concavity, ultra-log-concavity, and a maximum entropy property of discrete compound Poisson measures , 2009, Discret. Appl. Math..

[21]  Tamara Broderick,et al.  Clusters and Features from Combinatorial Stochastic Processes , 2014 .

[22]  Michael I. Jordan,et al.  Combinatorial Clustering and the Beta Negative Binomial Process , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.