The Attraction Indian Buffet Distribution

We propose the attraction Indian buffet distribution (AIBD), a distribution for binary feature matrices influenced by pairwise similarity information. Binary feature matrices are used in Bayesian models to uncover latent variables (i.e., features) that explain observed data. The Indian buffet process (IBP) is a popular exchangeable prior distribution for latent feature matrices. In the presence of additional information, however, the exchangeability assumption is not reasonable or desirable. The AIBD can incorporate pairwise similarity information, yet it preserves many properties of the IBP, including the distribution of the total number of features. Thus, much of the interpretation and intuition that one has for the IBP directly carries over to the AIBD. A temperature parameter controls the degree to which the similarity information affects feature-sharing between observations. Unlike other nonexchangeable distributions for feature allocations, the probability mass function of the AIBD has a tractable normalizing constant, making posterior inference on hyperparameters straight-forward using standard MCMC methods. A novel posterior sampling algorithm is proposed for the IBP and the AIBD. We demonstrate the feasibility of the AIBD as a prior distribution in feature allocation models and compare the performance of competing methods in simulations and an application.

[1]  Hongyu Zhao,et al.  Phylogenetic Indian Buffet Process: Theory and Applications in Integrative Analysis of Cancer Genomics , 2013 .

[2]  Yee Whye Teh,et al.  Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[3]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[4]  Peter I. Frazier,et al.  Distance Dependent Infinite Latent Feature Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  J. Goeman L1 Penalized Estimation in the Cox Proportional Hazards Model , 2009, Biometrical journal. Biometrische Zeitschrift.

[6]  Paul Damien,et al.  A New Class of Time Dependent Latent Factor Models with Applications , 2019, J. Mach. Learn. Res..

[7]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[8]  Zoubin Ghahramani,et al.  Dependent Indian Buffet Processes , 2010, AISTATS.

[9]  Peter I. Frazier,et al.  Distance dependent Chinese restaurant processes , 2009, ICML.

[10]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[11]  Riten Mitra,et al.  Nonparametric Bayesian Bi-Clustering for Next Generation Sequencing Count Data. , 2013, Bayesian analysis.

[12]  Yuan Ji,et al.  Bayesian Double Feature Allocation for Phenotyping With Electronic Health Records , 2018, Journal of the American Statistical Association.

[13]  Yuan Ji,et al.  BayClone: Bayesian Nonparametric Inference of Tumor Subclones Using NGS Data , 2014, Pacific Symposium on Biocomputing.

[14]  Vinayak A. Rao,et al.  The Indian Buffet Hawkes Process to Model Evolving Latent Influences , 2018, UAI.

[15]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[16]  Yuan Ji,et al.  A Bayesian feature allocation model for tumor heterogeneity , 2015, 1509.04026.

[17]  Ziv Bar-Joseph,et al.  Inferring Interaction Networks using the IBP applied to microRNA Target Prediction , 2011, NIPS.

[18]  P. Thall,et al.  A Bayesian Feature Allocation Model for Identification of Cell Subpopulations Using Cytometry Data , 2020, 2002.08609.

[19]  P. Müller,et al.  Bayesian inference for intratumour heterogeneity in mutations and copy number variation , 2016, Journal of the Royal Statistical Society. Series C, Applied statistics.

[20]  Arthur W. Toga,et al.  Effi cient , distributed and interactive neuroimaging data analysis using the LONI Pipeline , 2009 .

[21]  David B. Dahl,et al.  Random Partition Distribution Indexed by Pairwise Information , 2017, Journal of the American Statistical Association.

[22]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[23]  Thomas L. Griffiths,et al.  The Phylogenetic Indian Buffet Process: A Non-Exchangeable Nonparametric Prior for Latent Features , 2008, UAI.

[24]  P. Müller,et al.  MAD Bayes for Tumor Heterogeneity—Feature Allocation With Exponential Family Sampling , 2014, Journal of the American Statistical Association.