Protein sequence comparison based on K-string dictionary.

[1]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[5]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[6]  S. Pääbo,et al.  Conflict Among Individual Mitochondrial Proteins in Resolving the Phylogeny of Eutherian Orders , 1998, Journal of Molecular Evolution.

[7]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[8]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[9]  Tiee-Jian Wu,et al.  Statistical Measures of DNA Sequence Dissimilarity under Markov Chain Models of Base Composition , 2001, Biometrics.

[10]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[11]  Jonas S. Almeida,et al.  Analysis of genomic sequences by Chaos Game Representation , 2001, Bioinform..

[12]  J. M. Fernández,et al.  Unfolding of titin domains explains the viscoelastic behavior of skeletal myofibrils. , 2001, Biophysical journal.

[13]  Steve Baker,et al.  Integrated gene and species phylogenies from unaligned whole genome protein sequences , 2002, Bioinform..

[14]  J. Leader,et al.  A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. , 2002, Molecular biology and evolution.

[15]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[16]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[17]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[18]  Amir Niknejad,et al.  DNA sequence representation without degeneracy. , 2003, Nucleic acids research.

[19]  Dejan Plavšić,et al.  Novel 2-D graphical representation of DNA sequences and their numerical characterization , 2003 .

[20]  Zu-Guo Yu,et al.  Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes. , 2003, Molecular biology and evolution.

[21]  Tuan D. Pham,et al.  A probabilistic measure for alignment-free sequence comparison , 2004, Bioinform..

[22]  Jun Cai,et al.  Classifying G-protein coupled receptors with bagging classification tree , 2004, Comput. Biol. Chem..

[23]  Bo Liao,et al.  New 2D graphical representation of DNA sequences , 2004, J. Comput. Chem..

[24]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[25]  J. Qi,et al.  Whole genome molecular phylogeny of large dsDNA viruses using composition vector method , 2007, BMC Evolutionary Biology.

[26]  Libin Liu,et al.  Clustering DNA sequences by feature vectors. , 2006, Molecular phylogenetics and evolution.

[27]  Chenglong Yu,et al.  A protein map and its application. , 2008, DNA and cell biology.

[28]  Matthew N. Davies,et al.  Alignment-Independent Techniques for Protein Classification , 2008 .

[29]  Xiang Fang,et al.  An improved string composition method for sequence comparison , 2008, BMC Bioinformatics.

[30]  Naruya Saitou,et al.  Estimation of bacterial species phylogeny through oligonucleotide frequency distances. , 2009, Genomics.

[31]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[32]  Kareem Carr,et al.  A Rapid Method for Characterization of Protein Relatedness Using Feature Vectors , 2010, PloS one.

[33]  Raymond H. Chan,et al.  Composition Vector Method for Phylogenetics — A Review , 2010 .

[34]  Changchuan Yin,et al.  A Novel Construction of Genome Space with Biological Geometry , 2010, DNA research : an international journal for rapid publication of reports on genes and genomes.

[35]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[36]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[37]  Stephen S.-T. Yau,et al.  DNA sequence comparison by a novel probabilistic method , 2011, Inf. Sci..

[38]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[39]  Chenglong Yu,et al.  Protein map: an alignment-free sequence comparison method based on various properties of amino acids. , 2011, Gene.

[40]  Chenglong Yu,et al.  A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications , 2011, PloS one.

[41]  Raymond H. Chan,et al.  Composition Vector Method Based on Maximum Entropy Principle for Sequence Comparison , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Shek-Chung Yau,et al.  Protein space: a natural method for realizing the nature of protein universe. , 2013, Journal of theoretical biology.