Combinatorics from Bacterial Genomes

By visualizing bacterial genome data we have encountered a few neat mathematical problems. The first problem concerns the number of longer missing strings (of length K + i, i ≥ 1) taken away by the absence of one or more K-strings. The exact solution of the problem may be obtained by using the Golden-Jackson cluster method in combinatorics and by making use of a special kind of formal languages, namely, the factorizable language. The second problem consists in explaining the fine structure observed in one-dimensional K-string histograms of some randomized genomes. The third problem is the uniqueness of reconstructing a protein sequence from its constituent K-peptides. The latter problem has a natural connection with the number of Eulerian loops in a graph. To tell whether a protein sequence has a unique reconstruction at a given K the factorizable language again comes to our help.