Bayes-optimal estimation of overlap between populations of fixed size

Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects—species, taxonomical units, or gene variants, depending on the context—can be directly counted. In practice, however, only a fraction of each population’s objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty.

[1]  Neil Hall,et al.  Plasmodium falciparum Variant Surface Antigen Expression Patterns during Malaria , 2005, PLoS pathogens.

[2]  M. Pascual,et al.  Networks of genetic similarity reveal non-neutral processes shape strain structure in Plasmodium falciparum , 2017, bioRxiv.

[3]  M. Gatton,et al.  Genetic diversity of the DBLalpha region in Plasmodium falciparum var genes among Asia-Pacific isolates. , 2002, Molecular and biochemical parasitology.

[4]  G. Wunderlich,et al.  The South American Plasmodium falciparum var gene repertoire is limited, highly shared and possibly lacks several antigenic types. , 2010, Gene.

[5]  Kevin J. Gaston,et al.  Measuring beta diversity for presence–absence data , 2003 .

[6]  A. Stirling A general framework for analysing diversity in science, technology and society , 2007, Journal of The Royal Society Interface.

[7]  Mark B. Schultz,et al.  Phylogeography of var gene repertoires reveals fine‐scale geospatial clustering of Plasmodium falciparum populations in a highly endemic area , 2015, Molecular ecology.

[8]  Daniel Ting,et al.  Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches , 2016, KDD.

[9]  Joel H. Janes,et al.  A restricted subset of var genes mediates adherence of Plasmodium falciparum-infected erythrocytes to brain endothelial cells , 2012, Proceedings of the National Academy of Sciences.

[10]  D. Larremore,et al.  Immune Characterization of Plasmodium falciparum Parasites with a Shared Genetic Signature in a Region of Decreasing Transmission , 2014, Infection and Immunity.

[11]  Zbynek Bozdech,et al.  A subset of group A-like var genes encodes the malaria parasite ligands for binding to human brain endothelial cells , 2012, Proceedings of the National Academy of Sciences.

[12]  George Githinji,et al.  Prognostic Indicators of Life-Threatening Malaria Are Associated with Distinct Parasite Variant Antigen Profiles , 2012, Science Translational Medicine.

[13]  R. Whittaker Vegetation of the Siskiyou Mountains, Oregon and California , 1960 .

[14]  L. Marcello,et al.  Analysis of the VSG gene silent archive in Trypanosoma brucei reveals that mosaic gene expression is prominent in antigenic variation and is favored by archive substructure. , 2007, Genome research.

[15]  Caroline O. Buckee,et al.  An approach to classifying sequence tags sampled from Plasmodium falciparum var genes , 2007, Molecular and biochemical parasitology.

[16]  G. McVean,et al.  Population Genomics of the Immune Evasion (var) Genes of Plasmodium falciparum , 2007, PLoS pathogens.

[17]  S. Kyes,et al.  Var gene diversity in Plasmodium falciparum is generated by frequent recombination events. , 2000, Molecular and biochemical parasitology.

[18]  P. Preiser,et al.  The Plasmodium falciparum STEVOR Multigene Family Mediates Antigenic Variation of the Infected Erythrocyte , 2009, PLoS pathogens.

[19]  Kevin Marsh,et al.  A Molecular Epidemiological Study of var Gene Diversity to Characterize the Reservoir of Plasmodium falciparum in Humans in Africa , 2011, PloS one.

[20]  M. Wahlgren,et al.  PfEMP1-DBL1α amino acid motifs in severe disease states of Plasmodium falciparum malaria , 2007, Proceedings of the National Academy of Sciences.

[21]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[22]  Q. Cheng,et al.  stevor and rif are Plasmodium falciparum multicopy gene families which potentially encode variant antigens. , 1998, Molecular and biochemical parasitology.

[23]  Thomas M. Keane,et al.  Plasmodium falciparum var gene expression is modified by host immunity , 2009, Proceedings of the National Academy of Sciences.

[24]  J. Andrew Royle,et al.  Hierarchical Bayes estimation of species richness and occupancy in spatially replicated surveys , 2008 .

[25]  Calyampudi R. Rao Diversity and dissimilarity coefficients: A unified approach☆ , 1982 .

[26]  A. Craig,et al.  Specific Receptor Usage in Plasmodium falciparum Cytoadherence Is Associated with Disease Outcome , 2011, PloS one.

[27]  M. Berriman,et al.  Genomes of all known members of a Plasmodium subgenus reveal paths to virulent human malaria , 2018, Nature Microbiology.

[28]  Jonathan M. Chase,et al.  Navigating the multiple meanings of β diversity: a roadmap for the practicing ecologist. , 2011, Ecology letters.

[29]  Thomas S. Rask,et al.  Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa , 2017, Proceedings of the National Academy of Sciences.

[30]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[31]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[32]  C. Buckee,et al.  Inferring malaria parasite population structure from serological networks , 2008, Proceedings of the Royal Society B: Biological Sciences.

[33]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[34]  Robert K. Colwell,et al.  A new statistical approach for assessing similarity of species composition with incidence and abundance data , 2004 .

[35]  N. Isaac,et al.  Measuring β‐diversity with species abundance data , 2015, The Journal of animal ecology.