Novelty Indicator for Enhanced Prioritization of Predicted Gene Ontology Annotations

Biomolecular controlled annotations have become pivotal in computational biology, because they allow scientists to analyze large amounts of biological data to better understand test results, and to infer new knowledge. Yet, biomolecular annotation databases are incomplete by definition, like our knowledge of biology, and might contain errors and inconsistent information. In this context, machine-learning algorithms able to predict and prioritize new annotations are both effective and efficient, especially if compared with time-consuming trials of biological validation. To limit the possibility that these techniques predict obvious and trivial high-level features, and to help prioritize their results, we introduce a new element that can improve accuracy and relevance of the results of an annotation prediction and prioritization pipeline. We propose a novelty indicator able to state the level of “originality” of the annotations predicted for a specific gene to Gene Ontology (GO) terms. This indicator, joint with our previously introduced prediction steps, helps by prioritizing the most novel interesting annotations predicted. We performed an accurate biological functional analysis of the prioritized annotations predicted with high accuracy by our indicator and previously proposed methods. The relevance of our biological findings proves effectiveness and trustworthiness of our indicator and of its prioritization of predicted annotations.

[1]  T. Inada,et al.  Novel roles of the multi-functional CCR4-NOT complex in post-transcriptional regulation , 2014, Front. Genet..

[2]  Jesús S. Aguilar-Ruiz,et al.  GO-based Functional Dissimilarity of Gene Sets , 2011, BMC Bioinformatics.

[3]  Marco Masseroli,et al.  Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice , 2007, IEEE Transactions on Information Technology in Biomedicine.

[4]  Marco Masseroli,et al.  Detection of gene annotations and protein-protein interaction associated disorders through transitive relationships between integrated annotations , 2015, BMC Genomics.

[5]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[6]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[7]  Pierre Baldi,et al.  Deep autoencoder neural networks for gene ontology annotation predictions , 2014, BCB.

[8]  A. Goldberg,et al.  Post-proteasomal antigen processing for major histocompatibility complex class I presentation , 2004, Nature Immunology.

[9]  J. Sontag,et al.  Protein phosphatase 2A dysfunction in Alzheimer’s disease , 2014, Front. Mol. Neurosci..

[10]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[11]  Francesco Pinciroli,et al.  GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining , 2004, Nucleic Acids Res..

[12]  Marco Masseroli,et al.  Weighting Scheme Methods for Enhanced Genomic Annotation Prediction , 2013, CIBB.

[13]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[14]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[15]  Marco Masseroli,et al.  Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations , 2013, 13th IEEE International Conference on BioInformatics and BioEngineering.

[16]  S. Encarnación-Guevara,et al.  Differential Proteomic Analysis of the Pancreas of Diabetic db/db Mice Reveals the Proteins Involved in the Development of Complications of Diabetes Mellitus , 2014, International journal of molecular sciences.

[17]  F. R. Melo,et al.  Mast Cell Proteoglycans , 2012, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[18]  Marco Masseroli,et al.  Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[19]  Marco Masseroli,et al.  A discrete optimization approach for SVD best truncation choice based on ROC curves , 2013, 13th IEEE International Conference on BioInformatics and BioEngineering.

[20]  Marco Masseroli,et al.  Ontology-Based Prediction and Prioritization of Gene Functional Annotations , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[22]  Deregulation of the phosphatase, PP2A is a common event in breast cancer, predicting sensitivity to FTY720 , 2014, EPMA Journal.

[23]  Marco Tagliasacchi,et al.  Anomaly-free Prediction of Gene Ontology Annotations Using Bayesian Networks , 2009, 2009 Ninth IEEE International Conference on Bioinformatics and BioEngineering.

[24]  Marco Masseroli,et al.  Integration and Querying of Genomic and Proteomic Semantic Annotations for Biomedical Knowledge Extraction , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[26]  Andreas Zell,et al.  A memetic clustering algorithm for the functional partition of genes based on the gene ontology , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[27]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[28]  Ira Mellman,et al.  Cell biology of antigen processing in vitro and in vivo. , 2005, Annual review of immunology.

[29]  W. Krzyzosiak,et al.  The panorama of miRNA-mediated mechanisms in mammalian cells , 2014, Cellular and Molecular Life Sciences.

[30]  Marco Tagliasacchi,et al.  Biomolecular annotation prediction through information integration , 2011 .

[31]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[32]  Marco Tagliasacchi,et al.  Genomic Annotation Prediction Based on Integrated Information , 2011, CIBB.

[33]  Davide Chicco Integration of Bioinformatics Web Services through the Search Computing Technology MINOR RESEARCH REPORT , 2012 .

[34]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[35]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[36]  D. Riches,et al.  Hyaluronate activation of CD44 induces insulin-like growth factor-1 expression by a tumor necrosis factor-alpha-dependent mechanism in murine macrophages. , 1993, The Journal of clinical investigation.

[37]  Marco Masseroli,et al.  Computational algorithms to predict Gene Ontology annotations , 2015, BMC Bioinformatics.

[38]  Marco Masseroli,et al.  Validation Pipeline for Computational Prediction of Genomics Annotations , 2015, CIBB.

[39]  A. Malmström,et al.  Iduronic Acid in Chondroitin/Dermatan Sulfate , 2012, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[40]  B. Prabhakar,et al.  Transcriptome Analysis of Epigenetically Modulated Genome Indicates Signature Genes in Manifestation of Type 1 Diabetes and Its Prevention in NOD Mice , 2013, PloS one.

[41]  Rachael P. Huntley,et al.  QuickGO: a web-based tool for Gene Ontology searching , 2009, Bioinform..

[42]  C. A. de la Motte,et al.  Hyaluronan, a Crucial Regulator of Inflammation , 2014, Front. Immunol..

[43]  Joaquín Dopazo,et al.  The role of the environment in Parkinson's disease. , 1996, Nucleic Acids Res..

[44]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[45]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[46]  Michael Y. Galperin,et al.  The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection , 2015, Nucleic Acids Res..

[47]  M. Bartlam,et al.  Insights into the structure and architecture of the CCR4–NOT complex , 2014, Front. Genet..

[48]  Mario Albrecht,et al.  FunSimMat: a comprehensive functional similarity database , 2007, Nucleic Acids Res..

[49]  Marco Masseroli,et al.  Latent Dirichlet Allocation based on Gibbs Sampling for gene function prediction , 2014, 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology.

[50]  Gaurav Pandey,et al.  Computational Approaches for Protein Function Prediction : A Survey , 2006 .

[51]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[52]  Marco Masseroli,et al.  Software Suite for Gene and Protein Annotation Prediction and Similarity Search , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.