COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets

Background Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations. Results Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER. Conclusion The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the flexibility of choosing a homology search protocol based on available compute resources. The cross-mapping database in COGNIZER is of high utility since it enables end-users to directly infer/derive KEGG, Pfam, GO, and SEED subsystem annotations from COG categorizations. Furthermore, availability of COGNIZER as a stand-alone scalable implementation is expected to make it a valuable annotation tool in the field of metagenomic research. Availability and Implementation A Linux implementation of COGNIZER is freely available for download from the following links: http://metagenomics.atc.tcs.com/cognizer, https://metagenomics.atc.tcs.com/function/cognizer.

[1]  M. Pignatelli,et al.  The oral metagenome in health and disease , 2011, The ISME Journal.

[2]  Peer Bork,et al.  iPath2.0: interactive pathway explorer , 2011, Nucleic Acids Res..

[3]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[4]  Emma Allen-Vercoe,et al.  Co-occurrence of anaerobic bacteria in colorectal carcinomas , 2013, Microbiome.

[5]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[6]  Tong Zhang,et al.  Construction of Customized Sub-Databases from NCBI-nr Database for Rapid Annotation of Huge Metagenomic Datasets Using a Combined BLAST and MEGAN Approach , 2013, PloS one.

[7]  Tulika Prakash,et al.  Functional assignment of metagenomic data: challenges and applications , 2012, Briefings Bioinform..

[8]  Fredrik H. Karlsson,et al.  Gut metagenome in European women with normal, impaired and diabetic glucose control , 2013, Nature.

[9]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[10]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[11]  Jed Fuhrman,et al.  Faculty Opinions recommendation of IMG/M: the integrated metagenome data management and comparative analysis system. , 2012 .

[12]  Rodrigo Lopez,et al.  A new bioinformatics analysis tools framework at EMBL–EBI , 2010, Nucleic Acids Res..

[13]  Chao Xie,et al.  A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA , 2013, Bioinform..

[14]  Intawat Nookaew,et al.  FANTOM: Functional and taxonomic analysis of metagenomes , 2013, BMC Bioinformatics.

[15]  Fabian Schreiber,et al.  CoMet—a web server for comparative functional profiling of metagenomes , 2011, Nucleic Acids Res..

[16]  Rick L. Stevens,et al.  Functional metagenomic profiling of nine biomes , 2008, Nature.

[17]  Monzoorul Haque Mohammed,et al.  Metagenome of the gut of a malnourished child , 2011, Gut pathogens.

[18]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[19]  Suparna Mitra,et al.  Introduction to the analysis of environmental sequences: metagenomics with MEGAN. , 2012, Methods in molecular biology.

[20]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[21]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[22]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[23]  Johannes Goll,et al.  Bioinformatics Applications Note Database and Ontologies Metarep: Jcvi Metagenomics Reports—an Open Source Tool for High-performance Comparative Metagenomics , 2022 .

[24]  Jing Chen,et al.  Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource , 2010, Nucleic Acids Res..

[25]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[26]  I-Min A. Chen,et al.  IMG/M: the integrated metagenome data management and comparative analysis system , 2011, Nucleic Acids Res..

[27]  Li Ni,et al.  The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species , 2009, PLoS Comput. Biol..

[28]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..