Mathematical Foundations of Computer Science 2004

We introduce and analyse a simple model of genome evolution. It is based on two fundamental evolutionary events: gene loss and gene duplication. We are mainly interested in asymptotic distributions of gene families in a genome. This is motovated by previous work which consisted in fitting the available genomic data into, what is called paralog distributions. Two approaches are presented in this paper: continuous and discrete time models. A comparison of them is presented too – the asymptotic distribution for the continuous time model can be seen as a limit of the discrete time distributions, when probabilities of gene loss and gene duplication tend to zero. We view this paper as an intermediate step towards mathematically settling the problem of characterizing the shape of paralog distribution in bacterial genomes.