Consistency in representation and transformation of genomic sequences

Many previous results in genomic sequence analysis have been derived based on the representation of genomic structures as numerical sequences. Various mapping strategies have been proposed for the representation of genomic and proteomic sequences. However, little is understood about the effect of specific choices of numerical mappings on the final analysis results. In fact, inconsistent numerical mappings could have led to contradictory results in genomic sequence analysis. In this paper, we propose a mathematical framework for analysis of the consistency in representation and transformation of numerical mappings of genomic sequences. We introduce strong and weak correlation metrics to characterize consistency measures among distinct numerical mappings. We derive sufficient conditions to ensure consistency among different numerical mappings. We present an important class of equivalent transforms under the proposed consistency conditions. We also derive a class of operators which is shown to be equivalent under rotation of numerical mappings. Finally, we conduct computer simulation experiments on DNA sequences which demonstrate the theoretical results.

[1]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[2]  Wei Wang,et al.  Computing linear transforms of symbolic signals , 2002, IEEE Trans. Signal Process..

[3]  Dan Schonfeld,et al.  Nonstationary Analysis of Coding and Noncoding Regions in Nucleotide Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[4]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[5]  L. Lathauwer,et al.  Signal Processing based on Multilinear Algebra , 1997 .

[6]  P. Carpena,et al.  Identifying characteristic scales in the human genome. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  R. Mantegna,et al.  Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[8]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[9]  David S. Stoffer,et al.  Spectral analysis for categorical time series: Scaling and the spectral envelope , 1993 .