Parallelization of Machine Learning Methods by Using CUDA

Image analysis, data mining, protein folding and gene sequencing are some examples of high-intensive bioinformatics applications that require high computing resources. In this paper we present a problem of computationally intensive methodology for microarray data analysis, whose performance needs to be improved by using high performance computing techniques. Parallelization is a key computing technique for reducing the time required for the analyses and the classification procedure. GPU provides great level of parallelization based on throughput of vast amount of data needed for machine learning problems. Therefore, we propose a model for machine learning problems parallelization based on GPU programming that will increase the speedup of several stages of the machine learning process.

[1]  Kouros Owzar,et al.  permGPU: Using graphics processing units in RNA microarray association studies , 2010, BMC Bioinformatics.

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  Chao-Tung Yang,et al.  Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..

[4]  Chaoyang Zhang,et al.  Parallelization of multicategory support vector machines (PMC-SVM) for classifying microarray data , 2006, BMC Bioinformatics.

[5]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[6]  Kazutaka Katoh,et al.  Parallelization of the MAFFT multiple sequence alignment program , 2010, Bioinform..

[7]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[8]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[9]  Arnon Rosenthal,et al.  Methodological Review: Cloud computing: A new business paradigm for biomedical information sharing , 2010 .

[10]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[11]  Kuo-Bin Li,et al.  ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..

[12]  Suresh Marru,et al.  Bio and health informatics meets cloud : BioVLab as an example , 2013, Health Inf. Sci. Syst..

[13]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[14]  Nicholas A. Hamilton,et al.  Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Monika Simjanoska,et al.  Bayesian posterior probability classification of colorectal cancer probed with Affymetrix microarray technology , 2013, 2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[16]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[17]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[18]  Ronald C. Taylor An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.