By visualizing bacterial genome data we have encountered a few neat mathematical problems. The first problem concerns the number of longer missing strings (of length K + i, i ≥ 1) taken away by the absence of one or more K-strings. The exact solution of the problem may be obtained by using the Golden-Jackson cluster method in combinatorics and by making use of a special kind of formal languages, namely, the factorizable language. The second problem consists in explaining the fine structure observed in one-dimensional K-string histograms of some randomized genomes. The third problem is the uniqueness of reconstructing a protein sequence from its constituent K-peptides. The latter problem has a natural connection with the number of Eulerian loops in a graph. To tell whether a protein sequence has a unique reconstruction at a given K the factorizable language again comes to our help.
[1]
Huimin Xie,et al.
Visualization of K-tuple distribution in procaryote complete genomes and their randomized counterparts
,
2002,
Proceedings. IEEE Computer Society Bioinformatics Conference.
[2]
Bailin Hao,et al.
Decomposition and Reconstruction of Protein Sequences: The Problem of Uniqueness and Factorizable Langauge
,
2007
.
[3]
B. Hao,et al.
Avoided Strings in Bacterial Complete Genomes and a Related Combinatorial Problem
,
2000
.
[4]
SeeDNA: A Visualization Tool for K-string Content of Long DNA Sequences and Their Randomized Counterparts
,
2004,
Genomics, proteomics & bioinformatics.