UPObase: an online database of unspecific peroxygenases

Abstract There are many unspecific peroxygenases (UPOs) or UPO-like extracellular enzymes secreted by fungal species. These enzymes are considered special in their ways of catalyzing a wide variety of reactions such as epoxidation, peroxygenation and electron oxidations. This enzyme family exhibits diverse functions with thousands of UPOs and UPO-like sequences. These sequences are difficult to analyze without proper management tool and therefore desperately calls for a unified platform that can aide with annotation, classification, navigation and easy sequence retrieval. This prompted us to create an online database called Unspecific Peroxygenase Database (UPObase) (upobase.bioinformaticsreview.com) which currently includes 1948 peroxygenase-encoding protein sequences mined from more than 800 available fungal genomes. It provides information such as classification and motifs about each sequence and has functions such as homology search against UPObase sequence analyses such as multiple sequence alignments and phylogenetic trees. It also provides a new sequence submission facility. The database has been made user-friendly facilitating systematic search and filters. UPObase allows users to search for the sequences by organism name, cluster ID and accession number. Notably, in our previous study, 113 UPOs were classified into five subfamilies (I, II, III, IV and V) and an undetermined group (Pog) which remain established. In this study, using 1948 UPOs in our database, we were able to further identify six novel sub-superfamilies (Pog-a, Pog-b, Pog-c, Pog-d, Pog-e and Pog-f) with signature motifs and two distinct groups in Subfamily I and III, Ia and Ib, IIIa and IIIb, respectively. With the novel UPO-like sequences and classification, UPObase may serve for researchers working in the area of enzyme engineering and related fields.

[1]  Anne Galarneau,et al.  Catalytic, mild, and selective oxyfunctionalization of linear alkanes: current challenges. , 2012, Angewandte Chemie.

[2]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[3]  Frederik Coppens,et al.  PhyD3: a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization , 2017, bioRxiv.

[4]  Robert D. Finn,et al.  Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species , 2017, Nucleic Acids Res..

[5]  Gernot Kayser,et al.  Selective hydroxylation of alkanes by an extracellular fungal peroxygenase , 2011, The FEBS journal.

[6]  Sudhir Kumar,et al.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. , 2016, Molecular biology and evolution.

[7]  José C del Río,et al.  Regioselective oxygenation of fatty acids, fatty alcohols and other aliphatic compounds by a basidiomycete heme-thiolate peroxidase. , 2011, Archives of biochemistry and biophysics.

[8]  M. Hofrichter,et al.  Oxidations catalyzed by fungal peroxygenases. , 2014, Current opinion in chemical biology.

[9]  Alex Bateman,et al.  MEROPS: the database of proteolytic enzymes, their substrates and inhibitors , 2011, Nucleic Acids Res..

[10]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[11]  P.J.M. Bonants,et al.  Q-bank, a database with information for identification of plant quarantine plant pest and diseases , 2013 .

[12]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[13]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[14]  Roland Schubert,et al.  Molecular characterization of aromatic peroxygenase from Agrocybe aegerita , 2009, Applied Microbiology and Biotechnology.

[15]  Vassilios Ioannidis,et al.  PeroxiBase: the peroxidase database. , 2007, Phytochemistry.

[16]  Erik L. L. Sonnhammer,et al.  Scoredist: A simple and robust protein sequence distance estimator , 2005, BMC Bioinformatics.

[17]  M. Hofrichter,et al.  Fungal unspecific peroxygenases: heme-thiolate proteins that combine peroxidase and cytochrome p450 properties. , 2015, Advances in experimental medicine and biology.

[18]  Pedro W. Crous,et al.  MycoBank: an online initiative to launch mycology into the 21st century , 2004 .

[19]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[20]  D. Schomburg,et al.  BRENDA: a resource for enzyme data and metabolic information. , 2002, Trends in biochemical sciences.

[21]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[22]  M. Hofrichter,et al.  Novel Haloperoxidase from the Agaric Basidiomycete Agrocybe aegerita Oxidizes Aryl Alcohols and Aldehydes , 2004, Applied and Environmental Microbiology.

[23]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[24]  Shengfeng Huang,et al.  New insights on unspecific peroxygenases: superfamily reclassification and evolution , 2019, BMC Evolutionary Biology.

[25]  S. Dongen Graph clustering by flow simulation , 2000 .

[26]  Jürgen Pleiss,et al.  The Lipase Engineering Database: a navigation and analysis tool for protein families , 2003, Nucleic Acids Res..

[27]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[28]  Ramón Doallo,et al.  ProtTest 3: fast selection of best-fit models of protein evolution , 2011, Bioinform..