New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures

CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.

[1]  Nevan J. Krogan,et al.  COMPASS: A complex of proteins associated with a trithorax-related SET domain protein , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  María Martín,et al.  The Gene Ontology: enhancements for 2011 , 2011, Nucleic Acids Res..

[3]  Christine A. Orengo,et al.  A fast and automated solution for accurately resolving protein domain architectures , 2010, Bioinform..

[4]  Rolf Apweiler,et al.  Functional Information in SWISS-PROT: the Basis for Large-scale Characterisation of Protein Sequences , 2001, Briefings Bioinform..

[5]  Christine A. Orengo,et al.  Gene3D: merging structure and function for a Thousand genomes , 2009, Nucleic Acids Res..

[6]  Minoru Kanehisa,et al.  Domain shuffling and the evolution of vertebrates. , 2009, Genome research.

[7]  Ian Sillitoe,et al.  Assessing strategies for improved superfamily recognition , 2005, Protein science : a publication of the Protein Society.

[8]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[9]  Ian Sillitoe,et al.  Extending CATH: increasing coverage of the protein structure universe and linking structure with function , 2010, Nucleic Acids Res..

[10]  A. Berry,et al.  Identification of zinc‐binding ligands in the Class II fructose‐ 1,6‐bisphosphate aldolase of Escherichia coli , 1993, FEBS letters.

[11]  Michael Levitt,et al.  Evolutionarily consistent families in SCOP: sequence, structure and function , 2012, BMC Structural Biology.

[12]  Gautier Koscielny,et al.  Ensembl 2012 , 2011, Nucleic Acids Res..

[13]  Ian Sillitoe,et al.  Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis , 2011, Nucleic Acids Res..

[14]  Alexey G. Murzin,et al.  Structural Biology and Crystallization Communications Structural Classification of Proteins and Structural Genomics: New Insights into Protein Folding and Evolution , 2022 .

[15]  David A. Lee,et al.  GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains , 2009, Nucleic acids research.

[16]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[17]  Gemma L. Holliday,et al.  MACiE: exploring the diversity of biochemical reactions , 2011, Nucleic Acids Res..

[18]  Ian Sillitoe,et al.  FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies , 2011, Nucleic Acids Res..

[19]  Frances M. G. Pearl,et al.  CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures , 2007, PLoS Comput. Biol..

[20]  Nigel J. Martin,et al.  Gene3D: comprehensive structural and functional annotation of genomes , 2007, Nucleic Acids Res..

[21]  Sine Larsen,et al.  The crystal structure of lactococcus lactis dihydroorotate dehydrogenase A complexed with the enzyme reaction product throws light on its enzymatic function , 1998, Protein science : a publication of the Protein Society.

[22]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[23]  K. Jensen,et al.  Structural basis for the catalytic mechanism of a proficient enzyme: orotidine 5'-monophosphate decarboxylase. , 2000, Biochemistry.

[24]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[25]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[26]  G. Battistuzzi,et al.  δ‐aminolevulinate dehydrase: a new genetic polymorphism in man , 1981, Annals of human genetics.

[27]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[28]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[29]  Gabrielle A. Reeves,et al.  Structural diversity of domain superfamilies in the CATH database. , 2006, Journal of molecular biology.

[30]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..