Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

BackgroundSearching the enormous amount of information available in biomedical literature to extract novel functional relationships among genes remains a challenge in the field of bioinformatics. While numerous (software) tools have been developed to extract and identify gene relationships from biological databases, few effectively deal with extracting new (or implied) gene relationships, a process which is useful in interpretation of discovery-oriented genome-wide experiments.ResultsIn this study, we develop a Web-based bioinformatics software environment called FAUN or Feature Annotation Using Nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. FAUN is tested on three manually constructed gene document collections. Its utility and performance as a knowledge discovery tool is demonstrated using a set of genes associated with Autism.ConclusionsFAUN not only assists researchers to use biomedical literature efficiently, but also provides utilities for knowledge discovery. This Web-based software environment may be useful for the validation and analysis of functional associations in gene subsets identified by high-throughput experiments.

[1]  Rob Jelier,et al.  CoPub Mapper: mining MEDLINE based on search term co-publication , 2005, BMC Bioinformatics.

[2]  Jonathan D. Wren,et al.  Knowledge discovery by automated identification and ranking of implicit relationships , 2004, Bioinform..

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[5]  Thomas Bourgeron,et al.  Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders , 2007, Nature Genetics.

[6]  Martijn J. Schuemie,et al.  Structuring and extracting knowledge for the support of hypothesis generation in molecular biology , 2009, BMC Bioinformatics.

[7]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[8]  Hyunsoo Kim,et al.  Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations , 2007, BMC Bioinformatics.

[9]  Martijn J. Schuemie,et al.  Thesaurus-based disambiguation of gene symbols , 2005, BMC Bioinformatics.

[10]  Elina Tjioe Discovering gene functional relationships using a literature-based NMF model , 2008 .

[11]  Hagit Shatkay,et al.  Discovering semantic features in the literature: a foundation for building functional associations , 2006, BMC Bioinformatics.

[12]  Petri Törönen,et al.  Theme discovery from gene lists for identification and viewing of multiple functional groups , 2005, BMC Bioinformatics.

[13]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[14]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[15]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[16]  T. Frayling Genome–wide association studies provide new insights into type 2 diabetes aetiology , 2007, Nature Reviews Genetics.

[17]  Chengyu Liu,et al.  Biclustering of gene expression data by non-smooth non-negative matrix factorization , 2010 .

[18]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[19]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[20]  Michael W. Berry,et al.  Using a literature-based NMF model for discovering gene functional relationships , 2008, 2008 IEEE International Conference on Bioinformatics and Biomeidcine Workshops.

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[22]  Michael W. Berry,et al.  GTP (General Text Parser) Software for Text Mining , 2003 .

[23]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[24]  Michael W. Berry,et al.  Gene clustering by Latent Semantic Indexing of MEDLINE abstracts , 2005, Bioinform..

[25]  Kevin Erich Heinrich,et al.  Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) , 2004 .

[26]  D. Geschwind,et al.  Advances in autism genetics: on the threshold of a new neurobiology , 2008, Nature Reviews Genetics.

[27]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[28]  Hagit Shatkay,et al.  Mining the Biomedical Literature in the Genomic Era: An Overview , 2003, J. Comput. Biol..

[29]  Michael W. Berry,et al.  Gene Tree Labeling Using Nonnegative Matrix Factorization on Biomedical Literature , 2008, Comput. Intell. Neurosci..

[30]  G. Robinson,et al.  Cooperation of signalling pathways in embryonic mammary gland development , 2008, Nature Reviews Genetics.

[31]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[32]  Eric G. Bremer Knowledge Discovery in Life Science Literature, PAKDD 2006 International Workshop, KDLL 2006, Singapore, April 9, 2006, Proceedings , 2006, KDLL.

[33]  Kevin Erich Heinrich,et al.  Automated Gene Classification using Nonnegative Matrix Factorization on Biomedical Literature , 2007 .

[34]  G. Scheper,et al.  Translation matters: protein synthesis defects in inherited disease , 2007, Nature Reviews Genetics.

[35]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[36]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[37]  Jörg Hakenberg,et al.  Knowledge Discovery in Life Science Literature: International Workshop, KDLL 2006, Singapore, April 9, 2006, Proceedings (Lecture Notes in Computer Science / Lecture Notes in Bioinformatics) , 2006 .

[38]  James A. Hendler,et al.  The National Cancer Institute's Thésaurus and Ontology , 2003, J. Web Semant..

[39]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[40]  K. Bretonnel Cohen,et al.  Getting Started in Text Mining , 2008, PLoS Comput. Biol..

[41]  Francisco Tirado,et al.  bioNMF: a versatile tool for non-negative matrix factorization in biology , 2006, BMC Bioinformatics.

[42]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[43]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[44]  Neil R. Smalheiser,et al.  A Quantitative Model for Linking Two Disparate Sets of Articles in Medline , 2022 .

[45]  M E Funk,et al.  Indexing consistency in MEDLINE. , 1983, Bulletin of the Medical Library Association.

[46]  Dietrich Rebholz-Schuhmann,et al.  BIOINFORMATICS ORIGINAL PAPER Data and text mining Resolving abbreviations to their senses in Medline , 2005 .

[47]  Weidong Wang Emergence of a DNA-damage response network consisting of Fanconi anaemia and BRCA proteins , 2007, Nature Reviews Genetics.

[48]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[49]  T Varilo,et al.  Association of DISC1 with autism and Asperger syndrome , 2008, Molecular Psychiatry.

[50]  Barend Mons,et al.  Online tools to support literature-based discovery in the life sciences , 2005, Briefings Bioinform..

[51]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[52]  Ronald N. Kostoff,et al.  Information content in Medline record fields , 2004, Int. J. Medical Informatics.

[53]  Jonathan D. Wren,et al.  Clustering microarray-derived gene lists through implicit literature relationships , 2007, Bioinform..

[54]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.