Factorizable language: from dynamics to bacterial complete genomes

Symbolic sequences generated by symbolic dynamics of a dynamical system belong to a special class of language in which any admissible word is factorisable as well as prolongable. From a complete genome sequence of an organism, one may also define a factorizable language. A factorizable language enjoys the nice property that it is entirely determined by the set of minimal fobidden words or distinct excluded blocks (DEBs). We use this property to calculate the fractal dimension of patterns related to a visualisation scheme of under-represented strings in bacterial complete genomes within the limit of infinitely long strings. The same problem may be solved by using a purely combinatorial approach. The methods described in this paper may be applied to other regular fractals with self-similar and self-overlapping structure. (C) 2000 Elsevier Science B.V. All rights reserved.