Checking homogeneity of motifs' distribution in heterogenous sequences

Studying the distribution of a motif along sequences may help to understand its biological function, or to detect regions of interest. A statistical model is needed to assess the significancy of the observed distribution. We propose an heterogenous compound Poisson process to model the possibility of overlap between occurrences and some heterogeneity of the sequence a priori known. The parameters estimation procedure is described and tests of homogenous sub-models are proposed. We also consider the detection of rich regions using either cumulated distances or moving intervals, via an homogenization technique. Illustrations of the method are given with applications to bacterial genomes.