Discovering Genes-Diseases Associations From Specialized Literature Using the Grid

This paper proposes a novel method for text mining on the Grid, aimed at pointing out hidden relationships for hypothesis generation and suitable for semi-interactive querying. The method is based on unsupervised clustering and the outputs are visualized with contextual information. Grid implementation is crucial for feasibility. We demonstrate it with a mining run for discovering genes-diseases associations from bibliographic sources and annotated databases. The proposed methodology is in view of a Grid architecture specialized in bioinformatics mining tasks. Some performance considerations are provided.

[1]  Teruyoshi Hishiki,et al.  Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning , 2005, Pacific Symposium on Biocomputing.

[2]  Alva L. Couch,et al.  Parallel K-means Clustering Algorithm on NOWs , 2003 .

[3]  Joyce A. Mitchell,et al.  Improving Literature Based Discovery Support by Genetic Knowledge Integration , 2003, MIE.

[4]  Anthony Rowe,et al.  A grid infrastructure for mixed bioinformatics data and text mining , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[5]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[6]  Tanja Bekhuis Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy , 2006, Biomedical digital libraries.

[7]  P. Bork,et al.  G2D: a tool for mining genes associated with disease , 2005, BMC Genetics.

[8]  Hans-Hermann Bock,et al.  Clustering Methods: A History of k-Means Algorithms , 2007 .

[9]  Mario Cannataro Next‐generation Grids: requirements and knowledge‐based services , 2006, Concurr. Comput. Pract. Exp..

[10]  Yong Meng Teo,et al.  ALiCE: A Scalable Runtime Infrastructure for High Performance Grid Computing , 2004, NPC.

[11]  Neil R. Smalheiser,et al.  Implicit Text Linkages between Medline Records: Using Arrowsmith as an Aid to Scientific Discovery , 1999, Libr. Trends.

[12]  Wanda Pratt,et al.  H.3.3 Information Search and Retrieval , 2022 .

[13]  Carol Friedman,et al.  Introduction: named entity recognition in biomedicine , 2004, J. Biomed. Informatics.

[14]  Jun'ichi Tsujii,et al.  Improving the performance of dictionary-based approaches in protein name recognition , 2004, J. Biomed. Informatics.

[15]  Jeyakumar Natarajan,et al.  A Grid Infrastructure for Text Mining of Full Text Articles and Creation of a Knowledge Base of Gene Relations , 2005, ISBMDA.

[16]  J. Natarajan,et al.  Knowledge Discovery in Biology and Biotechnology Texts: A Review of Techniques, Evaluation Strategies, and Applications , 2005, Critical reviews in biotechnology.

[17]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[18]  Hongfang Liu,et al.  Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS , 2002, J. Am. Medical Informatics Assoc..

[19]  Vasileios Hatzivassiloglou,et al.  Disambiguating proteins, genes, and RNA in text: a machine learning approach , 2001, ISMB.

[20]  Arun Krishnan,et al.  A survey of life sciences applications on the grid , 2009, New Generation Computing.

[21]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[22]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[23]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[24]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.