Genomic signals of chromosomes and of concatenated reoriented coding regions

Symbolic nucleotide sequences are converted into digital genomic signals by using a complex representation derived from a tetrahedral vector representation of nucleotides. The study of complex genomic signals using signal processing methods reveals large scale features of chromosomes that would be difficult to grasp by using the statistical and pattern matching methods for the analysis of symbolic genomic sequences. On the other hand, in the context of operating with a large volume of data at various resolutions and visualizing the results to make them available to humans, the problem of data representability becomes critical. A novel mathematical description of data representability, based on the data scattering ratio on a pixel is defined and is applied for several typical cases of standard signals and for genomic signals. It is shown that the variation of genomic data along nucleotide sequences, specifically the cumulated and unwrapped phase, can be visualized adequately as simple graphic lines for low and large scales, while for medium scales (thousands to tens of thousands of base pairs) the statistical descriptions have to be used.

[1]  E. Chargaff Structure and function of nucleic acids as cell constituents. , 1951, Federation proceedings.

[2]  Paul Dan Cristea,et al.  Large scale features in DNA genomic signals , 2003, Signal Process..

[3]  P.D. Cristea,et al.  Multiresolution phase analysis of genomic signals , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[4]  Paul Dan Cristea,et al.  Genomic Signals of Reoriented ORFs , 2004, EURASIP J. Adv. Signal Process..

[5]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[6]  P. Cristea,et al.  Signal processing of genomic information: mitochondrial genomic signals of hominidae , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).

[7]  Paul Dan Cristea Genetic signal analysis , 2001, Proceedings of the Sixth International Symposium on Signal Processing and its Applications (Cat.No.01EX467).

[8]  J A Eisen,et al.  The Genome of the Natural Genetic Engineer Agrobacterium tumefaciens C58 , 2001, Science.

[9]  P D Cristea Conversion of nucleotides sequences into genomic signals , 2002, Journal of cellular and molecular medicine.

[10]  J. Lobry Asymmetric substitution patterns in the two DNA strands of bacteria. , 1996, Molecular biology and evolution.

[11]  Paul Dan Cristea,et al.  Genetic signal representation and analysis , 2002, SPIE BiOS.

[12]  Temple F. Smith,et al.  Patterns of Genome Organization in Bacteria , 1998, Science.

[13]  Paul Dan Cristea Genomic signals for whole chromosomes , 2003, SPIE BiOS.

[14]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.