论文信息 - Novel Combinatorial and Information‐Theoretic Alignment‐Free Distances for Biological Data Mining - 字舞流文

Novel Combinatorial and Information‐Theoretic Alignment‐Free Distances for Biological Data Mining

Chiara Epifanio | Raffaele Giancarlo | Marinella Sciortino | Alessandra Gabriele | R. Giancarlo | M. Sciortino | A. Gabriele | C. Epifanio

[1] Jun Wang,et al. WSE, a new sequence distance measure based on word frequencies , 2008, Mathematical Biosciences.

[2] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3] David R. Gilbert,et al. Motif-based searching in TOPS protein topology databases , 1999, Bioinform..

[4] Matteo Comin,et al. Mining, compressing and classifying with extensible motifs , 2006, Algorithms for Molecular Biology.

[5] W. Kabsch,et al. Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[6] Natalio Krasnogor,et al. Measuring the similarity of protein structures by means of the universal similarity metric , 2004, Bioinform..

[7] Antonio Restivo,et al. A New Combinatorial Approach to Sequence Comparison , 2007, Theory of Computing Systems.

[8] Khalid Sayood,et al. A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[9] M. Waterman,et al. Distributional regimes for the number of k-word matches between two random sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10] B. Steipe,et al. Nh3D: A reference dataset of non-homologous protein structures , 2005, BMC Structural Biology.

[11] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[12] Gonzalo Navarro,et al. Compressed full-text indexes , 2007, CSUR.

[13] Long Li,et al. REDfly: a Regulatory Element Database for Drosophila , 2006, Bioinform..

[14] Dong Xu,et al. Phylogenetic analysis using complete signature information of whole genomes and clustered Neighbour-Joining method , 2006, Int. J. Bioinform. Res. Appl..

[15] Jonas S. Almeida,et al. Alignment-free sequence comparison-a review , 2003, Bioinform..

[16] Paul M. B. Vitányi,et al. Clustering by compression , 2003, IEEE Transactions on Information Theory.

[17] O. Gotoh. An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[18] Yanchun Yang,et al. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison , 2008, Bioinform..

[19] T. P. Flores,et al. Protein structural topology: Automated analysis and diagrammatic representation , 2008, Protein science : a publication of the Protein Society.

[20] H. Wilf,et al. Uniqueness theorems for periodic functions , 1965 .

[21] W. Pearson,et al. Sensitivity and selectivity in protein structure comparison , 2004, Protein science : a publication of the Protein Society.

[22] Steve Baker,et al. Integrated gene and species phylogenies from unaligned whole genome protein sequences , 2002, Bioinform..

[23] Jacques van Helden,et al. Metrics for comparing regulatory sequences on the basis of pattern counts , 2004, Bioinform..

[24] Shengrui Wang,et al. CLUSS: Clustering of protein sequences based on a new similarity measure , 2007, BMC Bioinformatics.

[25] Xiang Fang,et al. An improved string composition method for sequence comparison , 2008, BMC Bioinformatics.

[26] T. P. Flores,et al. An algorithm for automatically generating protein topology cartoons. , 1994, Protein engineering.

[27] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[28] Tuan D. Pham,et al. A probabilistic measure for alignment-free sequence comparison , 2004, Bioinform..

[29] J M Thornton,et al. An atlas of protein topology cartoons available on the World-Wide Web. , 1998, Trends in biochemical sciences.

[30] W. Pearson. Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[31] Sylvain Forêt,et al. Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences , 2006, BMC Bioinformatics.

[32] M. Kimura. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[33] B. Blaisdell. A measure of the similarity of sets of sequences not requiring sequence alignment. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[34] James R. Cole,et al. The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy , 2003, Nucleic Acids Res..

[35] S. Pääbo,et al. Conflict Among Individual Mitochondrial Proteins in Resolving the Phylogeny of Eutherian Orders , 1998, Journal of Molecular Evolution.

[36] Z. Xuan,et al. Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis , 2002, Journal of biological physics.

[37] Raffaele Giancarlo,et al. Textual data compression in computational biology: a synopsis , 2009, Bioinform..

[38] Frances M. G. Pearl,et al. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis , 2004, Nucleic Acids Res..

[39] L Alexander Lyznik,et al. ASF/SF2-like maize pre-mRNA splicing factors affect splice site utilization and their transcripts are alternatively spliced. , 2004, Gene.

[40] Antonio Restivo,et al. Words and forbidden factors , 2002, Theor. Comput. Sci..

[41] Tiee-Jian Wu,et al. Statistical Measures of DNA Sequence Dissimilarity under Markov Chain Models of Base Composition , 2001, Biometrics.

[42] Susan R. Wilson,et al. Approximate word matches between two random sequences , 2008 .

[43] Huey-Wen Yien,et al. Linguistic analysis of the human heartbeat using frequency and rank order statistics. , 2003, Physical review letters.

[44] David Burstein,et al. The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[45] Klara Kedem,et al. Finding the Consensus Shape for a Protein Family , 2003, Algorithmica.

[46] Michael S. Waterman,et al. Introduction to computational biology , 1995 .

[47] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[48] Jonas S. Almeida,et al. Comparative evaluation of word composition distances for the recognition of SCOP relationships , 2004, Bioinform..

[49] Alberto Apostolico,et al. Fast algorithms for computing sequence distances by exhaustive substring composition , 2008, Algorithms for Molecular Biology.

[50] C. J. Burden,et al. Asymptotic Behavior of k-Word Matches Between two Uniformly Distributed Sequences , 2007, Journal of Applied Probability.

[51] Gilles Didier,et al. Local Decoding of Sequences and Alignment-Free Comparison , 2006, J. Comput. Biol..

[52] Bin Ma,et al. The similarity metric , 2001, IEEE Transactions on Information Theory.

[53] J. Qi,et al. Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[54] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[55] Saurabh Sinha,et al. A statistical method for alignment-free comparison of regulatory sequences , 2007, ISMB/ECCB.

[56] Xin Chen,et al. An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[57] D. Lipman,et al. Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[58] Hong Luo,et al. CVTree: a phylogenetic tree reconstruction tool based on whole genomes , 2004, Nucleic Acids Res..

[59] Edmund K. Burke,et al. ProCKSI: a decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information , 2007, BMC Bioinformatics.

[60] M. Zalis,et al. Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. , 1999, Journal of molecular biology.

[61] Raffaele Giancarlo,et al. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment , 2007, BMC Bioinformatics.