论文信息 - Clustering of highly homologous sequences to reduce the size of large protein databases

Clustering of highly homologous sequences to reduce the size of large protein databases

We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560,000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches.

[1] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[2] Chris Sander,et al. Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..