Metagenomic Protein Function Prediction using SFLD and Thresholded Sequence Similarity Networks

Author(s): Yu, Jack | Advisor(s): Babbitt, Patricia C | Abstract: The Structure-Function Linkage Database (SFLD) is a database containing hierarchical classifications of enzymes that relates specific sequence-structure features to specific chemical properties. It contains a collection of tools and data for investigating sequence-structure-function relationships and hypothesizing function. Currently, users can query one or more “unknown” protein sequences against the database using Hidden Markov Model or BLAST, and be able to compare, classify, annotate against existing curated enzyme superfamilies, the largest grouping of proteins for which common ancestry can be inferred. Here we present a working pipeline that allows users to putatively assign functions to sequences derived from Metagenomics studies and to visualize relationships between these sequences and existing enzyme superfamilies using thresholded sequence similarity networks.