Model-based analysis of latent factors

Abstract. The detection of community or population structure through analysis of explicit cause–effect modeling of given observations has received considerable attention. The complexity of the task is mirrored by the large number of existing approaches and methods, the applicability of which heavily depends on the design of efficient algorithms of data analysis. It is occasionally even difficult to disentangle concepts and algorithms. To add more clarity to this situation, the present paper focuses on elaborating the system analytic framework that probably encompasses most of the common concepts and approaches by classifying them as model-based analyses of latent factors. Problems concerning the efficiency of algorithms are not of primary concern here. In essence, the framework suggests an input–output model system in which the inputs are provided as latent model parameters and the output is specified by the observations. There are two types of model involved, one of which organizes the inputs by assigning combinations of potentially interacting factor levels to each observed object, while the other specifies the mechanisms by which these combinations are processed to yield the observations. It is demonstrated briefly how some of the most popular methods (Structure, BAPS, Geneland) fit into the framework and how they differ conceptually from each other. Attention is drawn to the need to formulate and assess qualification criteria by which the validity of the model can be judged. One probably indispensable criterion concerns the cause–effect character of the model-based approach and suggests that measures of association between assignments of factor levels and observations be considered together with maximization of their likelihoods (or posterior probabilities). In particular the likelihood criterion is difficult to realize with commonly used estimates based on Markov chain Monte Carlo (MCMC) algorithms. Generally applicable MCMC-based alternatives that allow for approximate employment of the primary qualification criterion and the implied model validation including further descriptors of model characteristics are suggested.

[1]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[2]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[3]  Mihajlo D. Mesarovic,et al.  Abstract Systems Theory , 1989 .

[4]  Christopher Phillips,et al.  An overview of STRUCTURE: applications, parameter settings, and supporting software , 2013, Front. Genet..

[5]  H. Gregorius,et al.  Measuring Differences of Trait Distributions Between Populations , 2003 .

[6]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[7]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[8]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[9]  Charalambos Neophytou Bayesian clustering analyses for genetic assignment and study of hybridization in oaks: effects of asymmetric phylogenies and asymmetric sampling schemes , 2013, Tree Genetics & Genomes.

[10]  C. Richards,et al.  Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates , 2009, PloS one.

[11]  Arnaud Estoup,et al.  A Spatial Statistical Model for Landscape Genetics , 2005, Genetics.

[12]  Partitioning of trait variation among communities: measures of apportionment and differentiation based on binary sampling , 2014, Theoretical Ecology.

[13]  L. Jost GST and its relatives do not measure differentiation , 2008, Molecular ecology.

[14]  J. Mank,et al.  Individual organisms as units of analysis: Bayesian-clustering alternatives in population genetics. , 2004, Genetical research.

[15]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[16]  Relating measures of compositional differentiation among communities to conceptual models of migration and selection , 2014 .

[17]  M. Sillanpää,et al.  Bayesian analysis of genetic differentiation between populations. , 2003, Genetics.