Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts.
暂无分享,去创建一个
Successful information retrieval from biomedical literature databases is becoming increasingly difficult. We have developed a prototype system for retrieving and visualizing information from literature and genomic databases using gene names. The premise of our work is that, if two genes have a related biological function, the co-occurrence of two gene names (or aliases of those genes) within the biomedical literature is more likely. From a collection of Medline documents, we have extracted the number of co-occurrences of every pair of Saccharomyces cerevisiae genes. The query is automatically conflated to include gene aliases as well. In addition, the retrieved document set can be filtered by the user with a MeSH term. From this co-occurrence data we construct a matrix that contains dissimilarity measurements of every pair of genes, based on their joint and individual occurrence statistics. A graph is generated from this matrix, with node and edge inclusion being determined by a user-defined threshold. Nodes of the graph represent genes, while edge lengths are a function of the occurrence of the two genes within the literature. Nodes can be hypertext-linked to sequence databases, while edges are linked to those Medline documents that generated them. The system is a tool for efficiently exploring the biomedical information landscape and may act as a inference network.
[1] A. Dunker. The pacific symposium on biocomputing , 1998 .
[2] T. Creighton. Methods in Enzymology , 1968, The Yale Journal of Biology and Medicine.
[3] 김삼묘,et al. “Bioinformatics” 특집을 내면서 , 2000 .
[4] Miranda Lee Pao,et al. Concepts of Information Retrieval , 1989 .
[5] K. Tamura,et al. Metabolic engineering of plant alkaloid biosynthesis. Proc Natl Acad Sci U S A , 2001 .