Sequence, Structure, and Evolution of Cellulases in Glycoside Hydrolase Family 48*

Background: Cellulases are non-homologous isofunctional enzymes, which prevents their unambiguous identification in genomic data sets. Results: Cellulases from glycoside hydrolase family 48 have distinct evolutionarily conserved sequence and structural features. Conclusion: Conserved sequence/structure features can be used to differentiate cellulases from non-cellulases in genomic data sets. Significance: Unambiguous identification of cellulases in genomic data is critical in searching for novel cellulolytic activities needed for bioenergy research. Currently, the cost of cellulase enzymes remains a key economic impediment to commercialization of biofuels (1). Enzymes from glycoside hydrolase family 48 (GH48) are a critical component of numerous natural lignocellulose-degrading systems. Although computational mining of large genomic data sets is a promising new approach for identifying novel cellulolytic activities, current computational methods are unable to distinguish between cellulases and enzymes with different substrate specificities that belong to the same protein family. We show that by using a robust computational approach supported by experimental studies, cellulases and non-cellulases can be effectively identified within a given protein family. Phylogenetic analysis of GH48 showed non-monophyletic distribution, an indication of horizontal gene transfer. Enzymatic function of GH48 proteins coded by horizontally transferred genes was verified experimentally, which confirmed that these proteins are cellulases. Computational and structural studies of GH48 enzymes identified structural elements that define cellulases and can be used to computationally distinguish them from non-cellulases. We propose that the structural element that can be used for in silico discrimination between cellulases and non-cellulases belonging to GH48 is an ω-loop located on the surface of the molecule and characterized by highly conserved rare amino acids. These markers were used to screen metagenomics data for “true” cellulases.

[1]  Lynne A. Goodwin,et al.  An Insect Herbivore Microbiome with High Plant Biomass-Degrading Capacity , 2010, PLoS genetics.

[2]  N. Pannu,et al.  REFMAC5 for the refinement of macromolecular crystal structures , 2011, Acta crystallographica. Section D, Biological crystallography.

[3]  William Stafford,et al.  Metagenomic gene discovery: past, present and future. , 2005, Trends in biotechnology.

[4]  B. Henrissat,et al.  Genome analyses highlight the different biological roles of cellulases , 2012, Nature Reviews Microbiology.

[5]  V. Zverlov,et al.  Two noncellulosomal cellulases of Clostridium thermocellum, Cel9I and Cel48Y, hydrolyse crystalline cellulose synergistically. , 2007, FEMS microbiology letters.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  R. Haser,et al.  The crystal structure of the processive endocellulase CelF of Clostridium cellulolyticum in complex with a thiooligosaccharide inhibitor at 2.0 Å resolution , 1998, The EMBO journal.

[8]  Jon S. Robertson,et al.  Azospirillum Genomes Reveal Transition of Bacteria from Aquatic to Terrestrial Environments , 2011, PLoS genetics.

[9]  E. Koonin,et al.  Horizontal gene transfer in prokaryotes: quantification and classification. , 2001, Annual review of microbiology.

[10]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[11]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[12]  Shenmin Zhang,et al.  Cloning, expression and characterization of a family 48 exocellulase, Cel48A, from Thermobifida fusca. , 2000, European journal of biochemistry.

[13]  T. Yamashita,et al.  A chitinase structurally related to the glycoside hydrolase family 48 is indispensable for the hormonally induced diapause termination in a beetle. , 2006, Biochemical and biophysical research communications.

[14]  S. Huws,et al.  Forage type and fish oil cause shifts in rumen bacterial diversity. , 2010, FEMS microbiology ecology.

[15]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[16]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[17]  C. Liu,et al.  Properties of exgS, a gene for a major subunit of the Clostridium cellulovorans cellulosome. , 1998, Gene.

[18]  Richard J. Giannone,et al.  Insights into plant biomass conversion from the genome of the anaerobic thermophilic bacterium Caldicellulosiruptor bescii DSM 6725 , 2011, Nucleic acids research.

[19]  Kazutaka Katoh,et al.  Parallelization of the MAFFT multiple sequence alignment program , 2010, Bioinform..

[20]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[21]  D. Kilburn,et al.  Cellobiohydrolase B, a second exo-cellobiohydrolase from the cellulolytic bacterium Cellulomonas fimi. , 1995, The Biochemical journal.

[22]  R. Haser,et al.  Structures of mutants of cellulase Cel48F of Clostridium cellulolyticum in complex with long hemithiocellooligosaccharides give rise to a new view of the substrate pathway during processive action. , 2008, Journal of molecular biology.

[23]  M. Podar,et al.  Cellulases: ambiguous nonhomologous enzymes in a genomic perspective. , 2011, Trends in biotechnology.

[24]  J. Fetrow Omega loops; nonregular secondary structures significant in protein function and stability , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[25]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[26]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[27]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[28]  N. Goldenfeld,et al.  Cellulosomics, a Gene-Centric Approach to Investigating the Intraspecific Diversity and Adaptation of Ruminococcus flavefaciens within the Rumen , 2011, PloS one.

[29]  P. Alzari,et al.  The crystal structure and catalytic mechanism of cellobiohydrolase CelS, the major enzymatic component of the Clostridium thermocellum Cellulosome. , 2002, Journal of molecular biology.

[30]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[31]  S. Tringe,et al.  Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen , 2011, Science.

[32]  Alexei Vagin,et al.  Molecular replacement with MOLREP. , 2010, Acta crystallographica. Section D, Biological crystallography.

[33]  Serge X. Cohen,et al.  Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7 , 2008, Nature Protocols.

[34]  C. Yanofsky,et al.  Biochemical Features and Functional Implications of the RNA-Based T-Box Regulatory Mechanism , 2009, Microbiology and Molecular Biology Reviews.

[35]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[36]  Raphael Lamed,et al.  Ruminococcus albus 8 Mutants Defective in Cellulose Degradation Are Deficient in Two Processive Endocellulases, Cel48A and Cel9B, Both of Which Possess a Novel Modular Architecture , 2004, Journal of bacteriology.

[37]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[38]  Randy J. Read,et al.  Overview of the CCP4 suite and current developments , 2011, Acta crystallographica. Section D, Biological crystallography.

[39]  T. Foust,et al.  Technoeconomic analysis of the dilute sulfuric acid and enzymatic hydrolysis process for the conversion of corn stover to ethanol , 2009 .

[40]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[41]  L. Hauberg-Lotte,et al.  Functional characteristics of an endophyte community colonizing rice roots as revealed by metagenomic analysis. , 2012, Molecular plant-microbe interactions : MPMI.

[42]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[43]  M. Pedraza-Reyes,et al.  Expression, Characterization and Synergistic Interactions of Myxobacter Sp. AL-1 Cel9 and Cel48 Glycosyl Hydrolases , 2008, International journal of molecular sciences.

[44]  Genia Dubrovsky,et al.  Deletion of the Cel48S cellulase from Clostridium thermocellum , 2010, Proceedings of the National Academy of Sciences.

[45]  V. Martin,et al.  Global View of the Clostridium thermocellum Cellulosome Revealed by Quantitative Proteomic Analysis , 2007, Journal of bacteriology.

[46]  E. Bayer,et al.  Interplay between Clostridium thermocellum Family 48 and Family 9 Cellulases in Cellulosomal versus Noncellulosomal States , 2010, Applied and Environmental Microbiology.

[47]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[48]  Amie D. Sluiter,et al.  Determination of Structural Carbohydrates and Lignin in Biomass , 2004 .

[49]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[50]  Lee R. Lynd,et al.  Diversity of Bacteria and Glycosyl Hydrolase Family 48 Genes in Cellulolytic Consortia Enriched from Thermophilic Biocompost , 2010, Applied and Environmental Microbiology.

[51]  A. Moya,et al.  Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data , 2011, PloS one.

[52]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.