A Note on Bayesian Inference with Incomplete Multinomial Data with Applications for Assessing the Spatio-Temporal Variation in Pathogen-Variant Diversity

Summary: With recent advance in genetic analysis, it has become feasible to classify a pathogeninto genetically distinct variants even though they apparently cause an infected subject similarsymptoms. The availability of such data opens up the interesting problem of studying the spatio-temporal variation in the diversity of variants of a pathogen. Data on pathogen variants oftensuffer the problems of (i) low cell counts, (ii) incomplete classification due to laboratory problems,for example, contamination, and (iii) unseen variants. Shannon entropy may be employed as ameasure of variant diversity. A Bayesian approach can be used to deal with the problems of lowcell counts and unseen variants. Bayesian analysis of incomplete multinomial data may be carriedout by Markov chain Monte Carlo techniques. However, for pathogen-variant data, it often happensthat there is only one source of missingness, namely, some subjects are known to be infected bysome unidentified pathogen variant. We point out that for incomplete data with disjoint sourcesof missingness, Bayesian analysis can be more efficiently done by an iid sampling scheme from theposterior distribution. We illustrate the method by analyzing a dataset on prevalence of bartonellainfection among individual colonies of prairie dog at the study site in Colorado, from 2003 to 2006.Key words: Shannon entropy, Bartonella, Bayes factor, Dirichlet distribution, Pathogen diver-sity, Spatial epidemiology.