Genetic signal representation and analysis

An original tetrahedral representation of the Genetic Code (GC), that better catches its structure, degeneracy and evolution trends, is defined. The possibility to reduce the dimensionality of the description by the projection of the GC tetrahedron on an adequately oriented plane is also considered, leading to complex representations of the GC. On these bases, optimal symbolic-to-digital mappings of the linear, one-dimensional and one-directional strands of nucleic acids into real or complex genetic signals are derived at nucleotide, codon and amino acid levels. By converting the sequences of nucleotides and polypeptides into digital genetic signals, this approach opens the possibility to use a large variety of signal processing methods for their processing and analysis. It is also shown that some essential features of nucleotide sequences can be better extracted using this representation. Some preliminary results in the comparative analysis of the statistical properties of intragenic vs. intergenic genetic signals are also presented. The use of Independent Component Analysis (ICA) to search for control sequences in the intergenic DNA, i.e., the part of the genome that does not encode proteins, is suggested.

[1]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[2]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[3]  W. Doolittle,et al.  Are There Bugs in Our Genome? , 2001, Science.

[4]  Henry Gee A journey into the genome: what's there , 2001 .

[5]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[6]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[7]  Paul Dan Cristea Genetic signal analysis , 2001, Proceedings of the Sixth International Symposium on Signal Processing and its Applications (Cat.No.01EX467).

[8]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.