Análise e compressão de sequências genómicas
暂无分享,去创建一个
[1] Ian H. Witten,et al. Arithmetic coding for data compression , 1987, CACM.
[2] Sudhir Kumar,et al. Multiple sequence alignment: in pursuit of homologous DNA positions. , 2007, Genome research.
[3] Ioan Tabus,et al. An efficient normalized maximum likelihood algorithm for DNA sequence compression , 2005, TOIS.
[4] Kimmo Fredriksson,et al. Shift-or string matching with super-alphabets , 2003, Inf. Process. Lett..
[5] Norman Abramson,et al. Information theory and coding , 1963 .
[6] John Shawe-Taylor,et al. Fast string matching using an n‐gram algorithm , 1994, Softw. Pract. Exp..
[7] Jean-Paul Delahaye,et al. Fast Discerning Repeats in DNA Sequences with a Compression Algorithm , 1997 .
[8] Simon Cawley,et al. Applications of generalized pair hidden Markov models to alignment and gene finding problems. , 2002 .
[9] Udi Manber,et al. A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .
[10] Marc-Thorsten Hütt,et al. Genome Phylogeny Based on Short-Range Correlations in DNA Sequences , 2005, J. Comput. Biol..
[11] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..
[12] D. Lipman,et al. Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.
[13] Tanya Z. Berardini,et al. PatMatch: a program for finding patterns in peptide and nucleotide sequences , 2005, Nucleic Acids Res..
[14] N. Goodman. Biological data becomes computer literate: new advances in bioinformatics. , 2002, Current opinion in biotechnology.
[15] G. F. Joyce. The antiquity of RNA-based evolution , 2002, Nature.
[16] Ian H. Witten,et al. Data mining in bioinformatics using Weka , 2004, Bioinform..
[17] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.
[18] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.
[19] Chuong B. Do,et al. Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .
[20] Durbin,et al. Biological Sequence Analysis , 1998 .
[21] Kimmo Fredriksson,et al. Faster String Matching with Super-Alphabets , 2002, SPIRE.
[22] T. Oikonomou,et al. Power law exponents characterizing human DNA. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.
[23] Deborah Joseph,et al. Beyond tandem repeats: complex pattern structures and distant regions of similarity , 2002, ISMB.
[24] S. Naranan,et al. Information Theory and Algorithmic Complexity: Applications to Language Discourses and DNA Sequences as Complex Systems Part II: Complexity of DNa Sequences, Analogy with Linguistic Discourses , 2000, J. Quant. Linguistics.
[25] Vladimir D. Gusev,et al. On the complexity measures of genetic sequences , 1999, Bioinform..
[26] D. Mount. Bioinformatics: Sequence and Genome Analysis , 2001 .
[27] Trevor I. Dix,et al. Sequence Complexity for Biological Sequence Analysis , 2000, Comput. Chem..
[28] L. Patthy. Modular Assembly of Genes and the Evolution of New Functions , 2003, Genetica.
[29] R. Voss,et al. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.
[30] Gregory Kucherov,et al. mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..
[31] Arnaud Lefebvre,et al. FORRepeats: detects repeats on entire chromosomes and between genomes , 2003, Bioinform..
[32] Jacques Cohen,et al. Computer science and bioinformatics , 2005, CACM.
[33] O. White,et al. A quality control algorithm for DNA sequencing projects. , 1993, Nucleic acids research.
[34] H. Herzel. Complexity of symbol sequences , 1988 .
[35] Richard Clark Pasco,et al. Source coding algorithms for fast data compression , 1976 .
[36] En-Hui Yang,et al. Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.
[37] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .
[38] D. Kugiumtzis,et al. Statistical analysis of gene and intergenic DNA sequences , 2004, q-bio/0404024.
[39] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.
[40] Noam Chomsky,et al. The Logical Structure of Linguistic Theory , 1975 .
[41] Stéphane Grumbach,et al. Compression of DNA sequences , 1993, [Proceedings] DCC `93: Data Compression Conference.
[42] Thierry Lecroq,et al. Fast exact string matching algorithms , 2007, Inf. Process. Lett..
[43] Eric V. Denardo,et al. Dynamic Programming: Models and Applications , 2003 .
[44] J. S. Heslop-Harrison,et al. Genomes, genes and junk: the large-scale organization of plant chromosomes , 1998 .
[45] Yong Zhang,et al. DNA sequence compression using the Burrows-Wheeler Transform , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.
[46] Armando J. Pinho,et al. Exploring Three-Base Periodicity for DNA Compression and Modeling , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[47] Sam Kwong,et al. A Compression Algorithm for DNA Sequences and Its Applications in Genome Comparison. , 1999 .
[48] Frantisek Franek,et al. A simple fast hybrid pattern-matching algorithm , 2007, J. Discrete Algorithms.
[49] D. Lipman,et al. Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.
[50] Arun Krishnan,et al. Exhaustive whole-genome tandem repeats search , 2004, Bioinform..
[51] Trevor I. Dix,et al. Comparative analysis of long DNA sequences by per element information content using different contexts , 2007, BMC Bioinformatics.
[52] Nikolay V. Dokholyan,et al. Similarity and dissimilarity in correlations of genomic DNA , 2007 .
[53] Ming Li,et al. Superiority and complexity of the spaced seeds , 2006, SODA 2006.
[54] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..
[55] P. Sellers. On the Theory and Computation of Evolutionary Distances , 1974 .
[56] D. Huffman. A Method for the Construction of Minimum-Redundancy Codes , 1952 .
[57] John C. Wootton,et al. Discovering Simple Regions in Biological Sequences Associated with Scoring Schemes , 2003, J. Comput. Biol..
[58] Behshad Behzadi,et al. DNA Compression Challenge Revisited: A Dynamic Programming Approach , 2005, CPM.
[59] Neri Merhav,et al. Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.
[60] Serap A. Savari,et al. On the entropy of DNA: algorithms and measurements based on memory and rapid convergence , 1995, SODA '95.
[61] Dina Sokol,et al. Filtering Tandem Repeats in DNA Sequences , 2006, BIOCOMP.
[62] Ka-Lok Ng,et al. Quantitative linguistic study of DNA sequences , 2003 .
[63] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[64] Giovanni Manzini,et al. A simple and fast DNA compressor , 2004, Softw. Pract. Exp..
[65] W. Pearson. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.
[66] Trevor I. Dix,et al. A Simple Statistical Algorithm for Biological Sequence Compression , 2007, 2007 Data Compression Conference (DCC'07).
[67] En-Hui Yang,et al. Estimating DNA sequence entropy , 2000, SODA '00.
[68] Robert S. Boyer,et al. A fast string searching algorithm , 1977, CACM.
[69] Ian H. Witten,et al. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.
[70] Pietro Lio,et al. Statistical analysis of simple repeats in the human genome , 2005, q-bio/0502009.
[71] H. Müller,et al. Statistical methods for DNA sequence segmentation , 1998 .
[72] C. Peng,et al. Long-range correlations in nucleotide sequences , 1992, Nature.
[73] John Case,et al. Computing Entropy for Ortholog Detection , 2004, International Conference on Computational Intelligence.
[74] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.
[75] Trevor I. Dix,et al. Compression and Approximate Matching , 1999, Comput. J..
[76] Gaston H. Gonnet,et al. A new approach to text searching , 1992, CACM.
[77] M Dauchet,et al. Compression and genetic sequence analysis. , 1996, Biochimie.
[78] R. Mantegna,et al. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[79] Mateo Valero,et al. Performance Analysis of Sequence Alignment Applications , 2006, 2006 IEEE International Symposium on Workload Characterization.
[80] Ian Witten,et al. Data Mining , 2000 .
[81] John G. Cleary,et al. Unbounded length contexts for PPM , 1995, Proceedings DCC '95 Data Compression Conference.
[82] Richard R. Sinden,et al. Triplet repeat DNA structures and human genetic disease: dynamic mutations from dynamic DNA , 2002, Journal of Biosciences.
[83] A Hariri,et al. On the validity of Shannon-information calculations for molecular biological sequences. , 1990, Journal of theoretical biology.
[84] Bin Ma,et al. PatternHunter: faster and more sensitive homology search , 2002, Bioinform..
[85] William B. Langdon,et al. Repeated Sequences in Linear GP Genomes , 2004 .
[86] Paulo Carvalho,et al. GRASPm: an efficient algorithm for exact pattern-matching in genomic sequences , 2009, Int. J. Bioinform. Res. Appl..
[87] John M. Hancock. Genome size and the accumulation of simple sequence repeats: implications of new data from genome sequencing projects , 2002, Genetica.
[88] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.
[89] Bin Ma,et al. DNACompress: fast and effective DNA sequence compression , 2002, Bioinform..
[90] Ioan Tabus,et al. DNA sequence compression using the normalized maximum likelihood model for discrete regression , 2003, Data Compression Conference, 2003. Proceedings. DCC 2003.
[91] Stephen Benz,et al. A DNA Motif Lexicon: cataloguing and annotating sequences. , 2004, In silico biology.
[92] Maria de Sousa Vieira,et al. Statistics of DNA sequences: a low-frequency analysis. , 1999, cond-mat/9905074.
[93] P Bork,et al. Automated extraction of information in molecular biology , 2000, FEBS letters.
[94] Yuriy L. Orlov,et al. Complexity: an internet resource for analysis of DNA sequence complexity , 2004, Nucleic Acids Res..
[95] Gonzalo Navarro,et al. Fast and flexible string matching by combining bit-parallelism and suffix automata , 2000, JEAL.
[96] Abraham Lempel,et al. Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.
[97] Jeremy Buhler,et al. Designing Multiple Simultaneous Seeds for DNA Similarity Search , 2005, J. Comput. Biol..
[98] Costas S. Iliopoulos,et al. Finding Approximate Occurrences of a Pattern That Contains Gaps , 2003 .
[99] S Karlin,et al. Patchiness and correlations in DNA sequences , 1993, Science.
[100] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[101] Khalid Sayood. Lossless Compression Handbook , 2003 .
[102] Ivo Grosse,et al. Repeats and correlations in human DNA sequences. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.
[103] Jorma Tarhio,et al. Alternative Algorithms for Bit-Parallel String Matching , 2003, SPIRE.
[104] Huiru Zheng,et al. An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions , 2006, J. Integr. Bioinform..
[105] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.
[106] Gonzalo Navarro,et al. A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching , 1998, CPM.
[107] S. Acharya. Some Aspects of Physicochemical Properties of DNA and RNA , 2006 .
[108] H. Herzel,et al. Estimating the entropy of DNA sequences. , 1997, Journal of theoretical biology.
[109] D. Krane,et al. Fundamental Concepts of Bioinformatics , 2002 .
[110] Daniel Sunday,et al. A very fast substring search algorithm , 1990, CACM.
[111] J. Shapiro. A 21st century view of evolution: genome system architecture, repetitive DNA, and natural genetic engineering. , 2005, Gene.
[112] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.
[113] David Sankoff,et al. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .
[114] W. Ebeling,et al. Finite sample effects in sequence analysis , 1994 .
[115] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..
[116] Daniel G. Brown. Optimizing Multiple Seeds for Protein Homology Search , 2005, TCBB.
[117] M. Turker,et al. Tandem B1 Elements Located in a Mouse Methylation Center Provide a Target for de Novo DNA Methylation* , 1999, The Journal of Biological Chemistry.
[118] Indranil Mukhopadhyay,et al. Word organization in coding DNA: A mathematical model , 2006, Theory in Biosciences.
[119] Jeremy Buhler,et al. Designing seeds for similarity search in genomic DNA , 2005, J. Comput. Syst. Sci..
[120] Valeria De Fonzo,et al. Hidden Markov Models in Bioinformatics , 2007 .
[121] Xin Chen,et al. An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..
[122] J. Jurka,et al. Microsatellites in different eukaryotic genomes: survey and analysis. , 2000, Genome research.
[123] Gary Benson,et al. Tandem repeats over the edit distance , 2007, Bioinform..
[124] P Bernaola-Galván,et al. Study of statistical correlations in DNA sequences. , 2002, Gene.
[125] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.
[126] John C. Kieffer,et al. Ergodic behavior of graph entropy , 1997 .
[127] V. R. Chechetkin,et al. LEVELS OF ORDERING IN CODING AND NONCODING REGIONS OF DNA SEQUENCES , 1996 .
[128] Limsoon Wong,et al. Accomplishments and challenges in literature data mining for biology , 2002, Bioinform..
[129] Werner Ebeling,et al. Entropy and complexity of finite sequences as fluctuating quantities. , 2002, Bio Systems.
[130] Antoine Danchin,et al. Genome structures, operating systems and the image of the machine , 2004 .
[131] Anthony Jf Griffiths,et al. Modern Genetic Analysis , 1998 .
[132] Ian H. Witten,et al. Arithmetic coding revisited , 1998, TOIS.
[133] R. Nigel Horspool,et al. Practical fast searching in strings , 1980, Softw. Pract. Exp..
[134] Thierry Lecroq,et al. Experimental results on string matching algorithms , 1995, Softw. Pract. Exp..
[135] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .
[136] W R Pearson,et al. Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.
[137] W. Stemmer,et al. Directed evolution of proteins by exon shuffling , 2001, Nature Biotechnology.
[138] Szymon Grabowski,et al. Revisiting dictionary‐based compression , 2005, Softw. Pract. Exp..
[139] J. Goodman,et al. The long (LINEs) and the short (SINEs) of it: altered methylation as a precursor to toxicity. , 2003, Toxicological sciences : an official journal of the Society of Toxicology.
[140] Stefano Lonardi,et al. Compression of biological sequences by greedy off-line textual substitution , 2000, Proceedings DCC 2000. Data Compression Conference.
[141] Gregory Kucherov,et al. Improved hit criteria for DNA local alignment , 2004, BMC Bioinformatics.
[142] D. Leach. Long DNA palindromes, cruciform structures, genetic instability and secondary structure repair , 1994, BioEssays : news and reviews in molecular, cellular and developmental biology.
[143] Alfonso Valencia,et al. Information extraction in molecular biology , 2002, Briefings Bioinform..
[144] Hanspeter Herzel,et al. Correlations in DNA sequences: The role of protein coding segments , 1997 .
[145] D R Powell,et al. Discovering simple DNA sequences by compression. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.
[146] Shigehiko Kanaya,et al. Statistical Analysis of Genomic Information: Various Periodicities in DNA Sequence , 2001 .
[147] Ian H. Witten,et al. Text Compression , 1990, 125 Problems in Text Algorithms.
[148] M. Ridley,et al. Genome: The Autobiography of a Species In 23 Chapters , 1999 .
[149] J. Lobry. THE BLACK HOLE OF SYMMETRIC MOLECULAR EVOLUTION , 2000 .
[150] Zaher Dawy,et al. Genomic analysis using methods from information theory , 2004, Information Theory Workshop.
[151] Toshiko Matsumoto,et al. Biological sequence compression algorithms. , 2000, Genome informatics. Workshop on Genome Informatics.
[152] Dachao Li,et al. Conditional LZ Complexity of DNA Sequences Analysis and its Application in Phylogenetic Tree Reconstruction , 2008, 2008 International Conference on BioMedical Engineering and Informatics.
[153] Bin Ma,et al. Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..
[154] David Loewenstern,et al. Significantly Lower Entropy Estimates for Natural DNA Sequences , 1999, J. Comput. Biol..
[155] Jorma Rissanen,et al. Generalized Kraft Inequality and Arithmetic Coding , 1976, IBM J. Res. Dev..
[156] Lukas Wagner,et al. A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..
[157] Chiara Romualdi,et al. Differential expression of genes coding for ribosomal proteins in different human tissues , 2001, Bioinform..
[158] R. Bellman. Dynamic programming. , 1957, Science.
[159] Eric Coissac,et al. Origin and fate of repeats in bacteria , 2002, Nucleic Acids Res..
[160] H Herzel,et al. Information content of protein sequences. , 2000, Journal of theoretical biology.
[161] M. Singer. SINEs and LINEs: Highly repeated short and long interspersed sequences in mammalian genomes , 1982, Cell.
[162] C Patience,et al. Our retroviral heritage. , 1997, Trends in genetics : TIG.
[163] Ian H. Witten,et al. Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..
[164] T. Govezensky,et al. Statistical properties of DNA sequences revisited: the role of inverse bilateral symmetry in bacterial chromosomes , 2004, q-bio/0408014.
[165] Michael D. Hendy,et al. Compressing DNA sequence databases with coil , 2007, BMC Bioinformatics.
[166] Lila L. Gatlin,et al. Information theory and the living system , 1972 .
[167] Louxin Zhang,et al. Good spaced seeds for homology search , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.
[168] Max Dauchet,et al. A first step toward chromosome analysis by compression algorithms , 1995, Proceedings First International Symposium on Intelligence in Neural and Biological Systems. INBS'95.
[169] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.
[170] Samuel Karlin,et al. Comparative statistics for DNA and protein sequences: multiple sequence analysis , 1985 .
[171] Timothy B. Stockwell,et al. The Sequence of the Human Genome , 2001, Science.
[172] Richard W. Hamming,et al. Error detecting and error correcting codes , 1950 .
[173] Claude E. Shannon,et al. The Mathematical Theory of Communication , 1950 .
[174] Bin Ma,et al. Optimizing Multiple Spaced Seeds for Homology Search , 2004, CPM.
[175] Mike Alder,et al. Natural Language Grammatical Inference , 1994 .