Genomic signal processing

Summary form only given. The sequencing of several genomes offers the opportunity to data mine and to explore in depth this unique data repository. Converting the genomic sequences into digital genomic signals offers the possibility to use signal processing methods for handling and analyzing genomic information. Using the genomic signal approach, long range features, maintained over distances of 10/sup 6/-10/sup 8/ of base pairs have been found. In the context of analyzing large volumes of data and of presenting the results in a easy to read form, the problem of data representability becomes critical. In this paper, a novel mathematical description of data graphical representability, based on the data scattering ratio for a pixel, is defined and is applied for several typical cases of standard signals and for genomic signals. It is shown that the variation of genomic data along nucleotide sequences, specifically the cumulated and unwrapped phase, can be visualized adequately as simple graphic lines for low and large scales, while for medium scales (thousands to tens of thousands of base pairs) the statistical descriptions have to be used.