Using a literature-based NMF model for discovering gene functional relationships

The rapid growth of the biomedical literature and genomic information presents a major challenge for determining the functional relationships among genes. In this study, we develop a Web-based bioinformatics software environment called FAUN or feature annotation using nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. We tested FAUN on three manually constructed gene document collections, and then used it to analyze several microarray-derived gene sets obtained from studies of the developing cerebellum in normal and mutant mice. FAUN provides utilities for collaborative knowledge discovery and identification of new gene relationships from text streams and repositories (e.g., MEDLINE). It is particularly useful for the validation and analysis of gene associations suggested by microarray experimentation.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Michael W. Berry,et al.  GTP (General Text Parser) Software for Text Mining , 2003 .

[3]  Michael W. Berry,et al.  Gene clustering by Latent Semantic Indexing of MEDLINE abstracts , 2005, Bioinform..

[4]  Francisco Tirado,et al.  Modulating the Expression of Disease Genes with RNA-Based Therapy , 2006, BMC Bioinformatics.

[5]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[6]  Petri Törönen,et al.  Theme discovery from gene lists for identification and viewing of multiple functional groups , 2005, BMC Bioinformatics.

[7]  Michael W. Berry,et al.  Email Surveillance Using Non-negative Matrix Factorization , 2005, Comput. Math. Organ. Theory.

[8]  Hagit Shatkay,et al.  Discovering semantic features in the literature: a foundation for building functional associations , 2006, BMC Bioinformatics.

[9]  Francisco Tirado,et al.  bioNMF: a versatile tool for non-negative matrix factorization in biology , 2006, BMC Bioinformatics.

[10]  Hyunsoo Kim,et al.  Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations , 2007, BMC Bioinformatics.

[11]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[12]  G. Robinson Cooperation of signalling pathways in embryonic mammary gland development , 2007, Nature Reviews Genetics.

[13]  G. Scheper,et al.  Translation matters: protein synthesis defects in inherited disease , 2007, Nature Reviews Genetics.

[14]  T. Frayling Genome–wide association studies provide new insights into type 2 diabetes aetiology , 2007, Nature Reviews Genetics.

[15]  Weidong Wang Emergence of a DNA-damage response network consisting of Fanconi anaemia and BRCA proteins , 2007, Nature Reviews Genetics.

[16]  Jonathan D. Wren,et al.  Clustering microarray-derived gene lists through implicit literature relationships , 2007, Bioinform..

[17]  Michael W. Berry,et al.  Gene Tree Labeling Using Nonnegative Matrix Factorization on Biomedical Literature , 2008, Comput. Intell. Neurosci..

[18]  D. Geschwind,et al.  Advances in autism genetics: on the threshold of a new neurobiology , 2008, Nature Reviews Genetics.