Searching for Compact Hierarchical Structures in DNA by means of the Smallest Grammar Problem
暂无分享,去创建一个
[1] Mario Gimona,et al. Protein linguistics — a grammar for modular protein assembly? , 2006, Nature Reviews Molecular Cell Biology.
[2] Alberto Apostolico,et al. Incremental Paradigms of Motif Discovery , 2004, J. Comput. Biol..
[3] Abhi Shelat,et al. The smallest grammar problem , 2005, IEEE Transactions on Information Theory.
[4] Menno van Zaanen,et al. Comparing Two Unsupervised Grammar Induction Systems: Alignment-Based Learning vs. EMILE , 2001 .
[5] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.
[6] H. S. Heaps,et al. A comparison of algorithms for data base compression by use of fragments as language elements , 1974, Inf. Storage Retr..
[7] Eytan Ruppin,et al. Unsupervised learning of natural languages , 2006 .
[8] A. H. Lipkus. A proof of the triangle inequality for the Tanimoto distance , 1999 .
[9] Paul Pritchard. On Computing the Subset Graph of a Collection of Sets , 1999, J. Algorithms.
[10] Gad M. Landau,et al. Random access to grammar-compressed strings , 2010, SODA '11.
[11] Aleksandar Milosavljevic,et al. Discovery by Minimal Length Encoding: A case study in molecular evolution , 1993, Machine Learning.
[12] A. Apostolico,et al. Off-line compression by greedy textual substitution , 2000, Proceedings of the IEEE.
[13] Ming Gu,et al. An efficient algorithm for dynamic text indexing , 1994, SODA '94.
[14] Eugene W. Myers,et al. Suffix arrays: a new method for on-line string searches , 1993, SODA '90.
[15] W. Ebeling,et al. On grammars, complexity, and information measures of biological macromolecules , 1980 .
[16] Sen Zhang,et al. Fast and Space Efficient Linear Suffix Array Construction , 2008, Data Compression Conference (dcc 2008).
[17] Matthias Gallé,et al. Searching for smallest grammars on large sequences and application to DNA , 2012, J. Discrete Algorithms.
[18] Hsiang-Chuan Liu,et al. Scaling Behavior of Maximal Repeat Distributions in Genomic Sequences , 2008, Int. J. Cogn. Informatics Nat. Intell..
[19] Enno Ohlebusch,et al. Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.
[20] Khalid Sayood,et al. Data Compression Concepts and Algorithms and Their Applications to Bioinformatics , 2009, Entropy.
[21] Matteo Comin,et al. Motifs in Ziv-Lempel-Welch Clef , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.
[22] Ian H. Witten,et al. Browsing in digital libraries: a phrase-based approach , 1997, DL '97.
[23] Ayumi Shinohara,et al. Collage system: a unifying framework for compressed pattern matching , 2003, Theor. Comput. Sci..
[24] Raju Uma,et al. A New Algorithm For Data Compression , 2013 .
[25] Alexander Clark,et al. Three Learnable Models for the Description of Language , 2010, LATA.
[26] Paolo Ferragina. Data Structures: Time, I/Os, Entropy, Joules! , 2010, ESA.
[27] Igor Potapov,et al. Real-time traversal in grammar-based compressed files , 2005, Data Compression Conference.
[28] Chris Mellish,et al. Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences , 2001, Bioinform..
[29] D Larhammar,et al. Lack of biological significance in the 'linguistic features' of noncoding DNA--a quantitative analysis. , 1996, Nucleic acids research.
[30] F. Crick. Central Dogma of Molecular Biology , 1970, Nature.
[31] Matthew Simon. Emergent computation - emphasizing bioinformatics , 2005, Biological and medical physics biomedical engineering.
[32] Ralph Grishman,et al. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.
[33] Stéphane Grumbach,et al. A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..
[34] E. Mark Gold,et al. Complexity of Automaton Identification from Given Data , 1978, Inf. Control..
[35] Stephen F. Bush,et al. Kolmogorov complexity estimation and application for information system security , 2003 .
[36] Edward M. McCreight,et al. A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.
[37] Toshiko Matsumoto,et al. Biological sequence compression algorithms. , 2000, Genome informatics. Workshop on Genome Informatics.
[38] Robert A. Wagner,et al. Common phrases and minimum-space text storage , 1973, CACM.
[39] H. Kuhn. The Hungarian method for the assignment problem , 1955 .
[40] Pedro M. Domingos. The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.
[41] Patrick Argos,et al. The Language of Protein Folding: Many Forked Tongues , 1992, Comput. Chem..
[42] Judith Roof,et al. The Poetics of DNA , 2007 .
[43] Alaa A. Kharbouch,et al. Three models for the description of language , 1956, IRE Trans. Inf. Theory.
[44] Amaury Habrard,et al. A Polynomial Algorithm for the Inference of Context Free Languages , 2008, ICGI.
[45] Maxime Crochemore,et al. Bases of motifs for generating repeated patterns with wild cards , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[46] Matías Bordese. Análisis y alternativas para la compresión de XML , 2009 .
[47] Uzi Vishkin,et al. Efficient approximate and dynamic matching of patterns using a labeling paradigm , 1996, Proceedings of 37th Conference on Foundations of Computer Science.
[48] S Ji,et al. The Linguistics of DNA: Words, Sentences, Grammar, Phonetics, and Semantics , 1999, Annals of the New York Academy of Sciences.
[49] Eli Upfal,et al. MADMX: A Novel Strategy for Maximal Dense Motif Extraction , 2009, WABI.
[50] Jacques Nicolas,et al. Browsing repeats in genomes: Pygram and an application to non-coding region analysis , 2006, BMC Bioinformatics.
[51] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .
[52] Pamela C. Cosman,et al. Universal lossless compression via multilevel pattern matching , 2000, IEEE Trans. Inf. Theory.
[53] Ian H. Witten,et al. Linear-time, incremental hierarchy inference for compression , 1997, Proceedings DCC '97. Data Compression Conference.
[54] Shmuel Tomi Klein,et al. Compression, information theory, and grammars: a unified approach , 1990, TOIS.
[55] Jonathan Miller,et al. MicroRNA Target Detection and Analysis for Genes Related to Breast Cancer Using MDLcompress , 2007, EURASIP J. Bioinform. Syst. Biol..
[56] Hiroki Arimura,et al. An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence , 2007, J. Comb. Optim..
[57] Matthias Gallé,et al. The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing , 2011, Algorithms.
[58] Wing-Kai Hon,et al. Compressed indexes for dynamic text collections , 2007, TALG.
[59] Hiroshi Sakamoto,et al. A Space-Saving Linear-Time Algorithm for Grammar-Based Compression , 2004, SPIRE.
[60] Yong Zhang,et al. DNA sequence compression using the Burrows-Wheeler Transform , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.
[61] Ian H. Witten,et al. Inferring lexical and grammatical structure from sequences , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[62] Paul M. B. Vitányi,et al. Clustering by compression , 2003, IEEE Transactions on Information Theory.
[63] Xin Chen,et al. A compression algorithm for DNA sequences , 2001, IEEE Engineering in Medicine and Biology Magazine.
[64] Amaury Habrard,et al. A Note on Contextual Binary Feature Grammars , 2009 .
[65] Cristian S. Calude,et al. Finite-State Complexity and the Size of Transducers , 2010, DCFS.
[66] Elena Rivas,et al. The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..
[67] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.
[68] Robert D. Cameron. Source encoding using syntactic information source models , 1988, IEEE Trans. Inf. Theory.
[69] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.
[70] Jorma Rissanen,et al. Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.
[71] T G Dewey,et al. The Shannon information entropy of protein sequences. , 1996, Biophysical journal.
[72] Geoffrey Sampson,et al. A proposal for improving the measurement of parse accuracy , 2000 .
[73] Alistair Moffat,et al. Off-line dictionary-based compression , 1999, Proceedings of the IEEE.
[74] Sérgio Deusdado,et al. Análise e compressão de sequências genómicas , 2008 .
[75] Atsuhiro Takasu,et al. Approximating Tree Edit Distance through String Edit Distance , 2008, Algorithmica.
[76] Philip Bille,et al. A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..
[77] David B. Searls,et al. Linguistic approaches to biological sequences , 1997, Comput. Appl. Biosci..
[78] Tomasz Müldner,et al. AXECHOP: a grammar-based compressor for XML , 2005, Data Compression Conference.
[79] E. Mark Gold,et al. Language Identification in the Limit , 1967, Inf. Control..
[80] Srinivas Aluru,et al. Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.
[81] M. Steel,et al. Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .
[82] Gerald Gazdar,et al. Applicability of Indexed Grammars to Natural Languages , 1988 .
[83] Aristotelis Tsirigos,et al. Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes. , 2006, Proceedings of the National Academy of Sciences of the United States of America.
[84] Makoto Kanazawa,et al. The Copying Power of Well-Nested Multiple Context-Free Grammars , 2010, LATA.
[85] Kunihiko Sadakane,et al. Faster suffix sorting , 2007, Theoretical Computer Science.
[86] Roberto Grossi,et al. On Updating Suffix Tree Labels , 1998, Theor. Comput. Sci..
[87] William F. Smyth,et al. A taxonomy of suffix array construction algorithms , 2007, CSUR.
[88] R. Eyraud. Inférence grammatical de langages hors-contextes , 2006 .
[89] Rens Bod,et al. The Data-Oriented Parsing Approach: Theory and Application , 2008, Computational Intelligence: A Compendium.
[90] Eric Steinbrecher,et al. Implementation of an Incremental MDL-Based Two Part Compression Algorithm for Model Inference , 2009, 2009 Data Compression Conference.
[91] Eric Lehman,et al. Approximation algorithms for grammar-based data compression , 2002 .
[92] Jean-Paul Delahaye,et al. A guaranteed compression scheme for repetitive DNA sequences , 1996, Proceedings of Data Compression Conference - DCC '96.
[93] E N Trifonov,et al. The multiple codes of nucleotide sequences. , 1989, Bulletin of mathematical biology.
[94] Craig G. Nevill-Manning,et al. Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..
[95] Dana Angluin,et al. Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..
[96] M A Nowak,et al. Explaining "linguistic features" of noncoding DNA. , 1996, Science.
[97] F Flam,et al. Hints of a language in junk DNA. , 1994, Science.
[98] Pierre Peterlongo,et al. In-Place Update of Suffix Array while Recoding Words , 2008, Int. J. Found. Comput. Sci..
[99] Gad M. Landau,et al. Unified Compression-Based Acceleration of Edit-Distance Computation , 2011, Algorithmica.
[100] Johann Pelfrêne,et al. Extracting approximate patterns , 2005, J. Discrete Algorithms.
[101] Matthias Gallé,et al. Choosing Word Occurrences for the Smallest Grammar Problem , 2010, LATA.
[102] Raffaele Giancarlo,et al. Textual data compression in computational biology: a synopsis , 2009, Bioinform..
[103] Gonzalo Navarro,et al. Re-pair Achieves High-Order Entropy , 2008, Data Compression Conference (dcc 2008).
[104] James A. Storer,et al. Data compression via textual substitution , 1982, JACM.
[105] David Loewenstern,et al. Significantly lower entropy estimates for natural DNA sequences , 1997, Proceedings DCC '97. Data Compression Conference.
[106] Gregory Stephanopoulos,et al. A linguistic model for the rational design of antimicrobial peptides , 2006, Nature.
[107] William F. Smyth,et al. Fast Optimal Algorithms for Computing All the Repeats in a String , 2008, Stringology.
[108] Jacques Nicolas,et al. CRISPI: a CRISPR interactive database , 2009, Bioinform..
[109] Pedro A. Pury,et al. Statistical keyword detection in literary corpora , 2007, ArXiv.
[110] Frederick P. Brooks,et al. Three great challenges for half-century-old computer science , 2003, JACM.
[111] Abraham Lempel,et al. Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.
[112] Yasubumi Sakakibara,et al. Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..
[113] Paul Pritchard,et al. A Simple Sub-Quadratic Algorithm for Computing the Subset Partial Order , 1995, Inf. Process. Lett..
[114] I.H. Witten,et al. On-line and off-line heuristics for inferring hierarchies of repetitions in sequences , 2000, Proceedings of the IEEE.
[115] Philip Gage,et al. A new algorithm for data compression , 1994 .
[116] Timothy C. Bell,et al. A corpus for the evaluation of lossless compression algorithms , 1997, Proceedings DCC '97. Data Compression Conference.
[117] Jeong Seop Sim. Time and Space Efficient Search for Small Alphabets with Suffix Arrays , 2005, FSKD.
[118] M. Nowak,et al. No signs of hidden language in noncoding DNA. , 1996, Physical review letters.
[119] Franco P. Preparata,et al. Data structures and algorithms for the string statistics problem , 1996, Algorithmica.
[120] V. Brendel,et al. Genome structure described by formal languages. , 1984, Nucleic acids research.
[121] Matthias Gallé. A New Tree Distance Metric for Structural Comparison of Sequences , 2010, Structure Discovery in Biology: Motifs, Networks & Phylogenies.
[122] Ayumi Shinohara,et al. Linear-Time Text Compression by Longest-First Substitution , 2009, Algorithms.
[123] Giovanni Manzini,et al. Engineering a Lightweight Suffix Array Construction Algorithm , 2002, ESA.
[124] Jeffrey D. Ullman,et al. Introduction to Automata Theory, Languages and Computation , 1979 .
[125] Travis Gagie,et al. Grammar-Based Compression in a Streaming Model , 2009, LATA.
[126] J. Wolff. AN ALGORITHM FOR THE SEGMENTATION OF AN ARTIFICIAL LANGUAGE ANALOGUE , 1975 .
[127] G.J. Saulnier,et al. Minimum description length principles for detection and classification of FTP exploits , 2004, IEEE MILCOM 2004. Military Communications Conference, 2004..
[128] S Ji,et al. The cell as the smallest DNA-based molecular computer. , 1999, Bio Systems.
[129] A A Tsonis,et al. Is DNA a language? , 1997, Journal of theoretical biology.
[130] Jon Louis Bentley,et al. Data compression using long common strings , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).
[131] Menno van Zaanen,et al. Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.
[132] Paolo Ferragina,et al. Text Compression , 2009, Encyclopedia of Database Systems.
[133] Perrin Matthieu. Compression de séquences d'A.D.N. à base de grammaires minimales , 2010 .
[134] Daniel M. Yellin. Algorithms for subset testing and finding maximal sets , 1992, SODA '92.
[135] Volker Brendel,et al. Gnomic : a dictionary of genetic codes , 1986 .
[136] Yasubumi Sakakibara,et al. Learning context-free grammars using tabular representations , 2005, Pattern Recognit..
[137] Edward R. Fiala,et al. Data compression with finite windows , 1989, CACM.
[138] Hélène Touzet,et al. A Linear Tree Edit Distance Algorithm for Similar Ordered Trees , 2005, CPM.
[139] Behshad Behzadi,et al. DNA Compression Challenge Revisited: A Dynamic Programming Approach , 2005, CPM.
[140] O Popov,et al. Linguistic complexity of protein sequences as compared to texts of human languages. , 1996, Bio Systems.
[141] Pang Ko,et al. Linear Time Construction of Suffix Arrays , 2002 .
[142] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[143] Trevor I. Dix,et al. A Simple Statistical Algorithm for Biological Sequence Compression , 2007, 2007 Data Compression Conference (DCC'07).
[144] Pierre Peterlongo,et al. Modeling local repeats on genomic sequences , 2008 .
[145] Gang Chen,et al. Lempel–Ziv Factorization Using Less Time & Space , 2008, Math. Comput. Sci..
[146] Anna Pagh,et al. Solving the String Statistics Problem in Time O(n log n) , 2002, ICALP.
[147] Colin de la Higuera,et al. Grammatical Inference: Learning Automata and Grammars , 2010 .
[148] Laurent Mouchard,et al. Dynamic Burrows-Wheeler Transform , 2008, Stringology.
[149] J. Collado-Vides,et al. Grammatical model of the regulation of gene expression. , 1992, Proceedings of the National Academy of Sciences of the United States of America.
[150] Craig G. Nevill-Manning,et al. Compression by induction of hierarchical grammars , 1994, Proceedings of IEEE Data Compression Conference (DCC'94).
[151] D. Searls,et al. Robots in invertebrate neuroscience , 2002, Nature.
[152] Gregory Kucherov,et al. Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[153] Bin Ma,et al. PatternHunter: faster and more sensitive homology search , 2002, Bioinform..
[154] Maxime Crochemore,et al. A Comparative Study of Bases for Motif Inference in String Algorithmics , 2004 .
[155] David B. Searls,et al. The computational linguistics of biological sequences , 1993, ISMB 1995.
[156] Giovanni Manzini,et al. A simple and fast DNA compressor , 2004, Softw. Pract. Exp..
[157] Rafael. Carrascosa. Gramáticas mínimas y descubrimiento de patrones , 2010 .
[158] En-Hui Yang,et al. Estimating DNA sequence entropy , 2000, SODA '00.
[159] Quanzhong Li,et al. Supporting efficient query processing on compressed XML files , 2005, SAC '05.
[160] Chan,et al. Can Zipf distinguish language from noise in noncoding DNA? , 1996, Physical review letters.
[161] Hiroshi Sakamoto,et al. A fully linear-time approximation algorithm for grammar-based compression , 2003, J. Discrete Algorithms.
[162] Chun Chen,et al. RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure , 2008, BMC Bioinformatics.
[163] Jyrki Katajainen,et al. An analysis of the longest match and the greedy heuristics in text encoding , 1992, JACM.
[164] Sean R. Eddy,et al. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.
[165] Menno van Zaanen,et al. ABL: Alignment-Based Learning , 2000, COLING.
[166] Amir Averbuch,et al. XML syntax conscious compression , 2006, Data Compression Conference (DCC'06).
[167] Temple F. Smith. Occam's razor , 1980, Nature.
[168] Matteo Comin,et al. VARUN: Discovering Extensible Motifs under Saturation Constraints , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[169] Tomasz Müldner,et al. A Grammar-based Approach for Compressing XML , 2005 .
[170] Christopher D. Manning,et al. The unsupervised learning of natural language structure , 2005 .
[171] Michael D. Hendy,et al. Compressing DNA sequence databases with coil , 2007, BMC Bioinformatics.
[172] Sherif Sakr,et al. XML compression techniques: A survey and comparison , 2009, J. Comput. Syst. Sci..
[173] Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..
[174] H E Stanley,et al. Linguistic features of noncoding DNA sequences. , 1994, Physical review letters.
[175] Richard E. Ladner,et al. Grammar-based Compression of DNA Sequences , 2007 .
[176] Rudi Cilibrasi,et al. Statistical inference through data compression , 2007 .
[177] H. Judson. The Eighth Day of Creation: Makers of the Revolution in Biology , 2013 .
[178] Akihiko Konagaya,et al. DNA Data Compression in the Post Genome Era , 2001 .
[179] Dan R. Olsen,et al. Compressing semi-structured text using hierarchical phrase identifications , 1996, Proceedings of Data Compression Conference - DCC '96.
[180] Simon J. Puglisi,et al. An efficient, versatile approach to suffix sorting , 2008, JEAL.
[181] Pieter W. Adriaans,et al. The EMILE 4.1 Grammar Induction Toolbox , 2002, ICGI.
[182] Ricardo A. Baeza-Yates,et al. A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.
[183] M. Neumüller,et al. Compression of XML Data , 2001 .
[184] Hiroshi Sakamoto,et al. Improving Time and Space Complexity for Compressed Pattern Matching , 2006, ISAAC.
[185] David B. Searls,et al. Trees of life and of language , 2003 .
[186] D. Fisher. The Eighth Day of Creation: Makers of the Revolution in Biology , 1979 .
[187] M. A. Jiménez-Montaño,et al. On the syntactic structure of protein sequences and the concept of grammar complexity , 1984 .
[188] Miguel A. Martínez-Prieto,et al. Compressed q-Gram Indexing for Highly Repetitive Biological Sequences , 2010, 2010 IEEE International Conference on BioInformatics and BioEngineering.
[189] Ian H. Witten,et al. Phrase hierarchy inference and compression in bounded space , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).
[190] Goulven Kerbellec,et al. Apprentissage d'automates modélisant des familles de séquences protéiques. (Learning automata modelling families of protein sequences) , 2008 .
[191] Amr Elmasry,et al. The Subset Partial Order: Computing and Combinatorics , 2010, ANALCO.
[192] Hiroshi Sakamoto,et al. Context-sensitive grammar transform: Compression and pattern matching , 2008 .
[193] Jacques Nicolas,et al. Genome analysis Suffix-tree analyser ( STAN ) : looking for nucleotidic and peptidic patterns in chromosomes , 2005 .
[194] David B. Searls,et al. String Variable Grammar: A Logic Grammar Formalism for the Biological Language of DNA , 1995, J. Log. Program..
[195] Ioan Tabus,et al. DNA sequence compression using the normalized maximum likelihood model for discrete regression , 2003, Data Compression Conference, 2003. Proceedings. DCC 2003.
[196] Gonzalo Navarro,et al. Compressed full-text indexes , 2007, CSUR.
[197] Julio Collado-Vides,et al. The search for a grammatical theory of gene regulation is formally justified by showing the inadequacy of context-free grammars , 1991, Comput. Appl. Biosci..
[198] Alberto Apostolico,et al. Optimal Offline Extraction of Irredundant Motif Bases , 2007, COCOON.
[199] Colin de la Higuera,et al. LARS: A learning algorithm for rewriting systems , 2006, Machine Learning.
[200] Matteo Comin,et al. Classification of protein sequences by means of irredundant patterns , 2010, BMC Bioinformatics.
[201] Bin Ma,et al. DNACompress: fast and effective DNA sequence compression , 2002, Bioinform..
[202] Jens Stoye,et al. An incomplex algorithm for fast suffix array construction , 2007, ALENEX/ANALCO.
[203] Michael Gribskov. The Language Metaphor in Sequence Analysis , 1992, Comput. Chem..
[204] Pieter W. Adriaans. Learning as Data Compression , 2007, CiE.
[205] Craig G. Nevill-Manning,et al. Inferring Sequential Structure , 1996 .
[206] Peter Sanders,et al. Simple Linear Work Suffix Array Construction , 2003, ICALP.
[207] D. B. Searls,et al. Reading the book of life , 2001, Bioinform..
[208] Dake He,et al. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform .2. With context models , 2000, IEEE Trans. Inf. Theory.
[209] David B. Searls,et al. Grammatical Representations of Macromolecular Structure , 2006, J. Comput. Biol..
[210] Stefano Lonardi,et al. Compression of biological sequences by greedy off-line textual substitution , 2000, Proceedings DCC 2000. Data Compression Conference.
[211] Esko Ukkonen,et al. Maximal and minimal representations of gapped and non-gapped motifs of a string , 2009, Theor. Comput. Sci..
[212] Esko Ukkonen,et al. On-line construction of suffix trees , 1995, Algorithmica.
[213] Gonzalo Navarro,et al. Self-Indexed Grammar-Based Compression , 2011, Fundam. Informaticae.
[214] Trevor I. Dix,et al. Compression of Strings with Approximate Repeats , 1998, ISMB.
[215] Raffaele Giancarlo,et al. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment , 2007, BMC Bioinformatics.
[216] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[217] Mihai Datcu,et al. A Similarity Measure Using Smallest Context-Free Grammars , 2010, 2010 Data Compression Conference.
[218] Wing-Kai Hon,et al. Compression, Indexing, and Retrieval for Massive String Data , 2010, CPM.
[219] Bin Ma,et al. The similarity metric , 2001, IEEE Transactions on Information Theory.
[220] Christian N. S. Pedersen,et al. Solving the String Statistics Problem in Time O(n log n) , 2002 .
[221] G. Korodi,et al. Compression of Annotated Nucleotide Sequences , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[222] Menno van Zaanen. ABL: Alignment-Based Learning , 2000, COLING.
[223] Ian H. Witten,et al. Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..
[224] Abhi Shelat,et al. Approximating the smallest grammar: Kolmogorov complexity in natural models , 2002, STOC '02.
[225] Xiaohui Xie,et al. Sequence analysis Human genomes as email attachments , 2022 .
[226] En-Hui Yang,et al. Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.
[227] Ayumi Shinohara,et al. Simple Linear-Time Off-Line Text Compression by Longest-First Substitution , 2007, 2007 Data Compression Conference (DCC'07).
[228] Alberto Apostolico,et al. Fast gapped variants for Lempel-Ziv-Welch compression , 2007, Inf. Comput..
[229] Jean-Paul Delahaye,et al. Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences , 1997, Comput. Appl. Biosci..
[230] Wing-Kai Hon,et al. I/O-Efficient Compressed Text Indexes: From Theory to Practice , 2010, 2010 Data Compression Conference.
[231] J. Wolff. Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .
[232] Alexander Clark,et al. Learning deterministic context free grammars: The Omphalos competition , 2006, Machine Learning.
[233] Matteo Comin,et al. Bridging Lossy and Lossless Compression by Motif Pattern Discovery , 2005, Electron. Notes Discret. Math..