High performance Grid computing for detecting gene-gene interactions in genome-wide association studies

The huge amount of biological information implies a great challenge for data analysis, particularly for combinatorial methods such as Multifactor Dimensionality Reduction. This method can be computationally intensive, especially when more than ten polymorphisms need to be evaluated. The Grid is a promising architecture for genomics problems providing high computing capabilities. In this paper, we describe a framework for supporting the MDR method on Grid environments. This framework helps biologists to automate the execution of multiple tests of gene-gene interactions detection. To evaluate the eciency of the proposed framework, we conduct experiments on the Grid5000. A Grid infrastructure distributed in nine sites around France, for research in large-scale parallel and distributed systems. compute-intensive

[1]  Franck Cappello,et al.  Grid'5000: a large scale and highly reconfigurable grid experimental testbed , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[2]  Jason H. Moore,et al.  Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS , 2010, Bioinform..

[3]  Domenico Talia,et al.  Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids , 2005, PKDD.

[4]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[5]  Yahya Slimani,et al.  Grid-Enabled Framework for Large-Scale Analysis of Gene-Gene Interactions , 2011, WiMo/CoNeCo.

[6]  Salvatore J. Stolfo,et al.  Report on Workshop on High Performance Computing and Communications for Grand Challenge Applications: Computer Vision, Speech and Natural Language Processing, and Artificial Intelligence , 1993, IEEE Trans. Knowl. Data Eng..

[7]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[8]  I. Melzer Web Services Description Language , 2010 .

[9]  Donald F. Ferguson,et al.  The WS-Resource Framework , 2004 .

[10]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[11]  Marylyn D. Ritchie,et al.  Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene-gene interactions , 2006, Bioinform..

[12]  Fabio Cancare,et al.  Accelerating epistasis analysis in human genetics with consumer graphics hardware , 2009, BMC Research Notes.

[13]  Luís Torgo,et al.  Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases , 2005 .

[14]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[15]  Laurent Briollais,et al.  Methodological issues in detecting gene-gene interactions in breast cancer susceptibility: a population-based study in Ontario , 2007, BMC medicine.

[16]  Adrian J. Shepherd,et al.  A computational Grid framework for immunological applications , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[17]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[18]  Li Ma,et al.  Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies , 2008, BMC Bioinformatics.

[19]  María S. Pérez-Hernández,et al.  Adapting the Weka Data Mining Toolkit to a Grid Based Environment , 2005, AWIC.