论文信息 - Discovering Genes-Diseases Associations From Specialized Literature Using the Grid

Discovering Genes-Diseases Associations From Specialized Literature Using the Grid

This paper proposes a novel method for text mining on the Grid, aimed at pointing out hidden relationships for hypothesis generation and suitable for semi-interactive querying. The method is based on unsupervised clustering and the outputs are visualized with contextual information. Grid implementation is crucial for feasibility. We demonstrate it with a mining run for discovering genes-diseases associations from bibliographic sources and annotated databases. The proposed methodology is in view of a Grid architecture specialized in bioinformatics mining tasks. Some performance considerations are provided.

Concetto Spampinato | Daniela Giordano | Francesco Maiorana | Alberto Faro

[1] Teruyoshi Hishiki,et al. Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning , 2005, Pacific Symposium on Biocomputing.

[2] Alva L. Couch,et al. Parallel K-means Clustering Algorithm on NOWs , 2003 .

[3] Joyce A. Mitchell,et al. Improving Literature Based Discovery Support by Genetic Knowledge Integration , 2003, MIE.

[4] Anthony Rowe,et al. A grid infrastructure for mixed bioinformatics data and text mining , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[5] Sophia Ananiadou,et al. Text Mining for Biology And Biomedicine , 2005 .

[6] Tanja Bekhuis. Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy , 2006, Biomedical digital libraries.

[7] P. Bork,et al. G2D: a tool for mining genes associated with disease , 2005, BMC Genetics.

[8] Hans-Hermann Bock,et al. Clustering Methods: A History of k-Means Algorithms , 2007 .

[9] Mario Cannataro. Next‐generation Grids: requirements and knowledge‐based services , 2006, Concurr. Comput. Pract. Exp..

[10] Yong Meng Teo,et al. ALiCE: A Scalable Runtime Infrastructure for High Performance Grid Computing , 2004, NPC.

[11] Neil R. Smalheiser,et al. Implicit Text Linkages between Medline Records: Using Arrowsmith as an Aid to Scientific Discovery , 1999, Libr. Trends.

[12] Wanda Pratt,et al. H.3.3 Information Search and Retrieval , 2022 .

[13] Carol Friedman,et al. Introduction: named entity recognition in biomedicine , 2004, J. Biomed. Informatics.

[14] Jun'ichi Tsujii,et al. Improving the performance of dictionary-based approaches in protein name recognition , 2004, J. Biomed. Informatics.

[15] Jeyakumar Natarajan,et al. A Grid Infrastructure for Text Mining of Full Text Articles and Creation of a Knowledge Base of Gene Relations , 2005, ISBMDA.

[16] J. Natarajan,et al. Knowledge Discovery in Biology and Biotechnology Texts: A Review of Techniques, Evaluation Strategies, and Applications , 2005, Critical reviews in biotechnology.

[17] Daniel Hanisch,et al. ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[18] Hongfang Liu,et al. Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS , 2002, J. Am. Medical Informatics Assoc..

[19] Vasileios Hatzivassiloglou,et al. Disambiguating proteins, genes, and RNA in text: a machine learning approach , 2001, ISMB.

[20] Arun Krishnan,et al. A survey of life sciences applications on the grid , 2009, New Generation Computing.

[21] Padmini Srinivasan,et al. Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[22] P. Bork,et al. Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[23] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[24] Michael Krauthammer,et al. Term identification in the biomedical literature , 2004, J. Biomed. Informatics.