论文信息 - Linear-time algorithms for computing maximum-density sequence segments with bioinformatics applications

Linear-time algorithms for computing maximum-density sequence segments with bioinformatics applications

We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a"i,w"i) for i=1,...,n and w"i>0, a segmentA(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i,j) is w(i,j)=@?"i"=<"k"=<"jw"k, and the density is (@?"i"=<"k"=<"ja"k)/w(i,j). The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(nlogL)-time algorithm by Lin, Jiang and Chao. We then extend this result, providing an O(n)-time algorithm for the case when both L and U are specified.

Ming-Yang Kao | Hsueh-I Lu | Michael H. Goldwasser | Hsueh-I Lu | M. Kao

[1] Howard Ochman,et al. Isochores result from mutation not selection , 1999, Nature.

[2] R. K. Assoian,et al. A GC-rich domain with bifunctional effects on mRNA and protein levels: implications for control of transforming growth factor beta 1 expression , 1993, Molecular and cellular biology.

[3] G Bernardi,et al. Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[4] A. Sobel,et al. The Journal of Biological Chemistry. , 2009, Nutrition reviews.

[5] Sung Kwon Kim,et al. Linear-time algorithm for finding a maximum-density segment of a sequence , 2003, Inf. Process. Lett..

[6] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[7] ScienceDirect. Bulletin of mathematical biology , 1973 .

[8] P. Guldberg,et al. Detection of mutations in GC-rich DNA by bisulphite denaturing gradient gel electrophoresis. , 1998, Nucleic acids research.

[9] R. Novick,et al. Why is the initiation nick site of an AT‐rich rolling circle plasmid at the tip of a GC‐rich cruciform? , 1997, The EMBO journal.

[10] Ming-Yang Kao,et al. Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics , 2002, WABI.

[11] N N Alexandrov,et al. Statistical significance of ungapped sequence alignments. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[12] IF You Discover,et al. THE BIOLOGICAL SCIENCE. , 1923, Science.

[13] G Bernardi,et al. The distribution of interspersed repeats is nonuniform and conserved in the mouse and human genomes. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[14] Jon Louis Bentley,et al. Programming pearls , 1987, CACM.

[15] A. Clark,et al. Local rates of recombination are positively correlated with GC content in the human genome. , 2001, Molecular biology and evolution.

[16] P. Sharp,et al. DNA sequence evolution: the sounds of silence. , 1995, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17] G Bernardi,et al. The gene distribution of the human genome. , 1996, Gene.

[18] L. Duret,et al. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores , 1995, Journal of Molecular Evolution.

[19] Brian Charlesworth,et al. Genetic Recombination: Patterns in the genome , 1994, Current Biology.

[20] A. Nekrutenko,et al. Assessment of compositional heterogeneity within and between eukaryotic genomes. , 2000, Genome research.

[21] J. Lakowicz,et al. Texture Analysis of Fluorescence Lifetime Images of AT- and GC-rich Regions in Nuclei , 2001, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[22] Hsueh-I Lu,et al. An Optimal Algorithm for the Maximum-Density Segment Problem , 2003, ESA.

[23] J. Mattick,et al. Genome research , 1990, Nature.

[24] Chris A. Fields,et al. gm: a practical tool for automating DNA sequence analysis , 1990, Comput. Appl. Biosci..

[25] G Bernardi,et al. An approach to the organization of eukaryotic genomes at a macromolecular level. , 1976, Journal of molecular biology.

[26] A. Meyers. Reading , 1999, Language Teaching.

[27] K Ikehara,et al. A possible origin of newly-born bacterial genes: significance of GC-rich nonstop frame on antisense strand. , 1996, Nucleic acids research.

[28] G. Bernardi,et al. Compositional constraints and genome evolution , 2005, Journal of Molecular Evolution.

[29] Mike O'Donnell,et al. Resolving a Fidelity Paradox , 2002, The Journal of Biological Chemistry.

[30] X. Huang,et al. An algorithm for identifying regions of a DNA sequence that satisfy a content requirement , 1994, Comput. Appl. Biosci..

[31] J. Osinga,et al. Improved mutation detection in GC-rich DNA fragments by combined DGGE and CDGE. , 1999, Nucleic acids research.

[32] G. Holmquist,et al. Chromosome bands, their chromatin flavors, and their functional features. , 1992, American journal of human genetics.

[33] A. Eyre-Walker,et al. Evidence that both G + C rich and G + C poor isochores are replicated early and late in the cell cycle. , 1992, Nucleic acids research.

[34] I. Longden,et al. EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[35] N. Sueoka. Directional mutation pressure and neutral molecular evolution. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[36] Ross B. Inman,et al. A denaturation map of the λ phage DNA molecule determined by electron microscopy , 1966 .

[37] S Schwartz,et al. Sequence and comparative analysis of the rabbit alpha-like globin gene cluster reveals a rapid mode of evolution in a G + C-rich region of mammalian genomes. , 1991, Journal of molecular biology.

[38] Wen-Hsiung Li,et al. Mutation rates differ among regions of the mammalian genome , 1989, Nature.

[39] W. Miller,et al. Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. , 1999, Nucleic acids research.

[40] Yaw-Ling Lin,et al. Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis , 2002, J. Comput. Syst. Sci..

[41] P. Sellers. Pattern recognition in genetic sequences by mismatch density , 1984 .

[42] Adam Eyre-Walker,et al. Recombination and mammalian genome evolution , 1993, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[43] W Henke,et al. Betaine improves the PCR amplification of GC-rich DNA sequences. , 1997, Nucleic acids research.

[44] G. Owens,et al. Interaction of CArG Elements and a GC-rich Repressor Element in Transcriptional Regulation of the Smooth Muscle Myosin Heavy Chain Gene in Vascular Smooth Muscle Cells* , 1997, The Journal of Biological Chemistry.

[45] Ronald I. Greenberg,et al. Fast and Space-Efficient Location of Heavy or Dense Segments in Run-Length Encoded Sequences: (Extended Abstract) , 2003, COCOON.

[46] A. R. Wagner. Molecular Biology and Evolution , 2001 .

[47] J. Filipski,et al. Correlation between molecular clock ticking, codon usage, fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells , 1987, FEBS letters.