Every DNA molecule can be described, avoiding its tridimensional structure, as a string (genome) of elements from a set of cardinality only four, whose elements (basis) can be listed as A, C, G, T. Sequences of DNA are repeated many times through the genome without yet understood biological function. DNA is in every living cells and whenever a cell duplicates itself every new offspring get a complete copy of the original DNA. During these replication events mismatches may happen, due to insertions, deletions or substitutions (actually also finite replications may happen, but these can be seen as multiple insertions). Biological experimental research has actually shown that not every region of the DNA has the same probability to be object of a change. This is due to the supposed purpose of every single region and to evolutionary reasons (a change in a region that does not effect macroscopical properties would be more likely). We will anyhow avoid all the considerations of this kind. This simple algebraic representation can also be applied to other common biological structures such as amminoacids, with different sets of base elements. In order to highlight the similarities and differences among the instances of such strings we want to define a good method of comparison. To do so we start from comparison between two strings, but first of all we need some definitions. Note that Σ is any given alphabet and Σ is the set of every finite string on it.
[1]
N. Metropolis,et al.
Equation of State Calculations by Fast Computing Machines
,
1953,
Resonance.
[2]
Christus,et al.
A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins
,
2022
.
[3]
R. K. Shyamasundar,et al.
Introduction to algorithms
,
1996
.
[4]
Miss A.O. Penney.
(b)
,
1974,
The New Yale Book of Quotations.
[5]
D. Gusfield.
Efficient methods for multiple sequence alignment with guaranteed error bounds
,
1993
.
[6]
Tao Jiang,et al.
On the Complexity of Multiple Sequence Alignment
,
1994,
J. Comput. Biol..
[7]
D. Lipman,et al.
The multiple sequence alignment problem in biology
,
1988
.