Using a literature-based NMF model for discovering gene functional relationships

The rapid growth of the biomedical literature and genomic information presents a major challenge for determining the functional relationships among genes. In this study, we develop a Web-based bioinformatics software environment called FAUN or feature annotation using nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. We tested FAUN on three manually constructed gene document collections, and then used it to analyze several microarray-derived gene sets obtained from studies of the developing cerebellum in normal and mutant mice. FAUN provides utilities for collaborative knowledge discovery and identification of new gene relationships from text streams and repositories (e.g., MEDLINE). It is particularly useful for the validation and analysis of gene associations suggested by microarray experimentation.

[1]  Hyunsoo Kim,et al.  Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations , 2007, BMC Bioinformatics.

[2]  Hagit Shatkay,et al.  Discovering semantic features in the literature: a foundation for building functional associations , 2006, BMC Bioinformatics.

[3]  T. Frayling Genome–wide association studies provide new insights into type 2 diabetes aetiology , 2007, Nature Reviews Genetics.

[4]  Michael W. Berry,et al.  Email Surveillance Using Non-negative Matrix Factorization , 2005, Comput. Math. Organ. Theory.

[5]  Michael W. Berry,et al.  Gene clustering by Latent Semantic Indexing of MEDLINE abstracts , 2005, Bioinform..

[6]  G. Robinson,et al.  Cooperation of signalling pathways in embryonic mammary gland development , 2008, Nature Reviews Genetics.

[7]  Michael W. Berry,et al.  GTP (General Text Parser) Software for Text Mining , 2003 .

[8]  Francisco Tirado,et al.  bioNMF: a versatile tool for non-negative matrix factorization in biology , 2006, BMC Bioinformatics.

[9]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[10]  D. Geschwind,et al.  Advances in autism genetics: on the threshold of a new neurobiology , 2008, Nature Reviews Genetics.

[11]  Michael W. Berry,et al.  Gene Tree Labeling Using Nonnegative Matrix Factorization on Biomedical Literature , 2008, Comput. Intell. Neurosci..

[12]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[13]  Jonathan D. Wren,et al.  Clustering microarray-derived gene lists through implicit literature relationships , 2007, Bioinform..

[14]  G. Scheper,et al.  Translation matters: protein synthesis defects in inherited disease , 2007, Nature Reviews Genetics.

[15]  Weidong Wang Emergence of a DNA-damage response network consisting of Fanconi anaemia and BRCA proteins , 2007, Nature Reviews Genetics.