UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches
暂无分享,去创建一个
Peter B. McGarvey | Cathy H. Wu | Yuqi Wang | Baris E. Suzek | Hongzhan Huang | Hongzhan Huang | P. McGarvey | Yuqi Wang
[1] The UniProt Consortium,et al. Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..
[2] Todd H. Oakley,et al. Gene duplication and the origins of morphological complexity in pancrustacean eyes, a genomic approach , 2010, BMC Evolutionary Biology.
[3] Robert D. Finn,et al. InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.
[4] A. Godzik,et al. Sequence clustering strategies improve remote homology recognitions while reducing search times. , 2002, Protein engineering.
[5] Jing Hu,et al. SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..
[6] Ketil Malde,et al. Increasing Sequence Search Sensitivity with Transitive Alignments , 2013, PloS one.
[7] E. Birney,et al. Pfam: the protein families database , 2013, Nucleic Acids Res..
[8] Russ B. Altman,et al. Improving the prediction of disease-related variants using protein three-dimensional structure , 2011, BMC Bioinformatics.
[9] M. Ashburner,et al. Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.
[10] Robert D. Finn,et al. Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation , 2011, PloS one.
[11] Tatsuya Akutsu,et al. Clustering of database sequences for fast homology search using upper bounds on alignment score. , 2004, Genome informatics. International Conference on Genome Informatics.
[12] Li Ni,et al. The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species , 2009, PLoS Comput. Biol..
[13] R. Altman,et al. A new disease-specific machine learning approach for the prediction of cancer-causing missense variants. , 2011, Genomics.
[14] Adam Godzik,et al. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..
[15] Christos A. Ouzounis,et al. The properties of protein family space depend on experimental design , 2005, Bioinform..
[16] Elon Portugaly,et al. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space , 2008, ISMB.
[17] Michael Gribskov,et al. Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..
[18] David A. Lee,et al. Identification and distribution of protein families in 120 completed genomes using Gene3D , 2005, Proteins.
[19] Robert S. Ledley,et al. PIRSF: family classification system at the Protein Information Resource , 2004, Nucleic Acids Res..
[20] Adam Godzik,et al. Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..
[21] Liisa Holm,et al. RSDB: representative protein sequence databases have high information content , 2000, Bioinform..
[22] Hugh E. Williams,et al. Clustered Sequence Representation for Fast Homology Search , 2007, J. Comput. Biol..
[23] Daniel J. Nasko,et al. VIROME: a standard operating procedure for analysis of viral metagenome sequences , 2012, Standards in genomic sciences.
[24] Eugene Kolker,et al. Quantifying Protein Function Specificity in the Gene Ontology , 2010, Standards in genomic sciences.
[25] María Martín,et al. Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..
[26] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.
[27] M. Gerstein,et al. The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties , 2002, Genome Biology.
[28] Peer Bork,et al. A Computational Screen for Type I Polyketide Synthases in Metagenomics Shotgun Data , 2008, PloS one.
[29] Anthony J. Kusalik,et al. The oligodeoxynucleotide sequences corresponding to never-expressed peptide motifs are mainly located in the non-coding strand , 2010, BMC Bioinformatics.
[30] D. Higgins,et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.
[31] Peter B. McGarvey,et al. UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..
[32] Cédric Notredame,et al. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee , 2012, BMC Bioinformatics.