Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing
暂无分享,去创建一个
Anantharaman Kalyanaraman | Shira L. Broschat | Armen Abnousi | A. Kalyanaraman | S. Broschat | Armen Abnousi
[1] Ulrik Brandes,et al. Analysis and Visualization of Social Networks , 2003, Graph Drawing Software.
[2] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[3] Fan Yang,et al. TIGRFAMs: a protein family resource for the functional identification of proteins , 2001, Nucleic Acids Res..
[4] Ravi Kumar,et al. Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.
[5] Peer Bork,et al. SMART: recent updates, new developments and status in 2015 , 2014, Nucleic Acids Res..
[6] O. Uhlenbeck,et al. Cloning and biochemical characterization of Bacillus subtilis YxiN, a DEAD protein specifically activated by 23S rRNA: delineation of a novel sub-family of bacterial DEAD proteins. , 1999, Nucleic acids research.
[7] Christus,et al. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .
[8] Robert D. Finn,et al. The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..
[9] Changjun Wu,et al. pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs , 2012, IEEE Transactions on Parallel and Distributed Systems.
[10] Sriram Krishnamoorthy,et al. A work stealing based approach for enabling scalable optimal sequence homology detection , 2015, J. Parallel Distributed Comput..
[11] P Bork,et al. Evolutionarily mobile modules in proteins. , 1993, Scientific American.
[12] R. Durbin,et al. Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.
[13] Zhengwei Zhu,et al. CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..
[14] J Schultz,et al. SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.
[15] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[16] Jean-Loup Guillaume,et al. Fast unfolding of communities in large networks , 2008, 0803.0476.
[17] A. Kalyanaraman,et al. A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions , 2016, PloS one.
[18] T. Attwood,et al. PRINTS--a database of protein motif fingerprints. , 1994, Nucleic acids research.
[19] Amos Bairoch,et al. PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..
[20] Changjun Wu,et al. An efficient parallel approach for identifying protein families in large-scale metagenomic data sets , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[21] Nathan Linial,et al. EVEREST: automatic identification and classification of protein domains in all protein sequences , 2006, BMC bioinformatics.
[22] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[23] Fedor V. Karginov,et al. The carboxy-terminal domain of the DExDH protein YxiN is sufficient to confer specificity for 23S rRNA. , 2002, Journal of molecular biology.
[24] Cathy H. Wu,et al. UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..
[25] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[26] Anantharaman Kalyanaraman,et al. Parallel Heuristics for Scalable Community Detection , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[27] Inderjit S. Dhillon,et al. Overlapping community detection using seed set expansion , 2013, CIKM.
[28] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[29] Steven J. Plimpton,et al. MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..
[30] Jérôme Gracy,et al. Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities , 1998, Bioinform..
[31] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.
[32] M. Linial,et al. Protein Clustering and Classification , 2004 .
[33] Svetlana Lockwood,et al. Applications and extensions of pClust to big microbial proteomic data , 2016 .
[34] L. Holm,et al. Exhaustive enumeration of protein domain families. , 2003, Journal of molecular biology.
[35] S. Broschat,et al. Comparative genomics reveals multiple pathways to mutualism for tick-borne pathogens , 2016, BMC Genomics.
[36] Gesine Reinert,et al. Alignment-Free Sequence Comparison (I): Statistics and Power , 2009, J. Comput. Biol..
[37] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[38] Peer Bork,et al. SMART: recent updates, new developments and status in 2020 , 2020, Nucleic Acids Res..
[39] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.
[40] Jérôme Gracy,et al. Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment , 1998, Bioinform..
[41] E. Snyder,et al. Rickettsia Phylogenomics: Unwinding the Intricacies of Obligate Intracellular Life , 2008, PloS one.