Discovery and assessment of gene-disease associations by integrated analysis of scientific literature and microarray data

The paper outlines a methodology and presents a tool to help biomedical researchers in interpreting complex experiments by automatically discovering gene networks and underlying biological processes (revealed by gene-expression patterns) that usually are extracted manually using existing tools. The proposed method, first, starts by mining specialized medical literature available on the Web to discover possible associations between genes and diseases. Discovered gene-disease associations are subsequently explored by analyzing abnormally expressed genes using microarray data analysis. Afterwards, relevant gene networks are built by clustering these genes on the basis of the similarity of their profile expressions in microarrays data. Finally, molecular, biological processes, cellular components and molecular functions, which may have a role in the disease, are pointed out by querying the Gene Ontology (GO) database. The methodology is illustrated by a case study on neuromuscular disorders.

[1]  Alexander V. Spirov,et al.  Graphical interface to the genetic network database GeNet , 1998, Bioinform..

[2]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[3]  Dragomir R. Radev,et al.  Identifying gene-disease associations using centrality on a literature mined gene-interaction network , 2008, ISMB.

[4]  Joshua LaBaer,et al.  Mining the literature and large datasets , 2003, Nature Biotechnology.

[5]  Concetto Spampinato,et al.  Discovering Genes-Diseases Associations From Specialized Literature Using the Grid , 2009, IEEE Transactions on Information Technology in Biomedicine.

[6]  Satoshi Niijima,et al.  GEM-TREND: a web tool for gene expression data mining toward relevant network discovery , 2009, BMC Genomics.

[7]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[8]  Jin Zhao,et al.  GenCLiP: a software program for clustering gene lists by literature profiling and constructing gene co-occurrence networks related to custom keywords , 2008, BMC Bioinformatics.

[9]  Hinrich W. H. Göhlmann,et al.  An Investigation on Performance of Significance Analysis of Microarray (SAM) for the Comparisons of Several Treatments with one Control in the Presence of Small‐variance Genes , 2008, Biometrical journal. Biometrische Zeitschrift.

[10]  Kerry J Kim Ingeneue: a software tool to simulate and explore genetic regulatory networks. , 2009, Methods in molecular biology.

[11]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[12]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[13]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[14]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[15]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Sergey V Anisimov Serial Analysis of Gene Expression (SAGE): 13 years of application in research. , 2008, Current pharmaceutical biotechnology.

[17]  Radu Vultur Mesh , 2011, Encyclopedia of Parallel Computing.

[18]  Younghoon Kim,et al.  BioCAD: an information fusion platform for bio-network inference and analysis , 2006, TMBIO '06.