Soft Computing Methods for Big Data Problems

Generally, big data computing deals with massive and high-dimensional data such as DNA microarray data, financial data, medical imagery, satellite imagery, and hyperspectral imagery. Therefore, big data computing needs advanced technologies or methods to solve the issues of computational time to extract valuable information without information loss. In this context, generally, machine learning (ML) algorithms have been considered to learn and find useful and valuable information from large value of data. However, ML algorithms such as neural networks are computationally expensive, and typically, the central processing unit (CPU) is unable to cope with these requirements. Thus, we need a high-performance computer to execute faster solutions such graphics processing unit (GPU). GPUs provide remarkable performance gains compared to CPUs. The GPU is relatively inexpensive with affordable price, availability, and scalability. Since 2006, NVIDIA provides simplification of the GPU programming model with the Compute Unified Device Architecture (CUDA), which supports for accessible programming interfaces and industry-standard languages, such as C and C++. Since then, general-purpose graphics processing unit (GPGPU) using ML algorithms are applied on various applications, including signal and image pattern classification in biomedical area. The importance of fast analysis of detecting cancer or non-cancer becomes the motivation of this study. Accordingly, we proposed soft computing methods, self-organizing map (SOM) and multiple back-propagation (MBP) for big data, particularly on biomedical classification problems. Big data such as gene expression datasets are executed on high-performance computer and Fermi architecture graphics hardware. Based on the experiment, MBP and SOM with GPU-Tesla generate faster computing times than high-performance computer with feasible results in terms of speed and classification performance.

[1]  Masahiro Takatsuka,et al.  Parallel Batch Training of the Self-Organizing Map Using OpenCL , 2010, ICONIP.

[2]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[3]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[4]  Carl E. Rasmussen,et al.  The Need for Open Source Software in Machine Learning , 2007, J. Mach. Learn. Res..

[5]  Radomir Gono,et al.  Efficient Computation of SOM for Outage Database , 2012 .

[6]  Zhongwen Luo,et al.  Self-Organizing Maps computing on Graphic Process Unit , 2005, ESANN.

[7]  Keechul Jung,et al.  GPU implementation of neural networks , 2004, Pattern Recognit..

[8]  Erik Berglund,et al.  Graphics Hardware Implementation of the Parameter-Less Self-organising Map , 2005, IDEAL.

[9]  Raghavendra D. Prabhu,et al.  SOMGPU: An unsupervised pattern classifier on Graphical Processing Unit , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[10]  Anders Eklund,et al.  Medical image processing on the GPU - Past, present and future , 2013, Medical Image Anal..

[11]  Noel Lopes,et al.  A strategy for dealing with missing values by using selective activation neurons in a multi-topology framework , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[12]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[13]  Nicholas Ayache,et al.  Medical Image Analysis: Progress over Two Decades and the Challenges Ahead , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Noel Lopes,et al.  GPU Implementation of the Multiple Back-Propagation Algorithm , 2009, IDEAL.

[15]  Muddassar Farooq,et al.  The Role of Biomedical Dataset in Classification , 2009, AIME.

[16]  Jan Platos,et al.  Large data real-time classification with Non-negative Matrix Factorization and Self-Organizing Maps on GPU , 2010, 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM).

[17]  Simone Palazzo,et al.  Integrating unsupervised and supervised clustering methods on a GPU platform for fast image segmentation , 2012, 2012 3rd International Conference on Image Processing Theory, Tools and Applications (IPTA).

[18]  Noel Lopes,et al.  GPUMLib: A new Library to combine Machine Learning algorithms with Graphics Processing Units , 2010, 2010 10th International Conference on Hybrid Intelligent Systems.

[19]  Richard Hurley,et al.  Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA , 2012 .

[20]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[21]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[22]  Noel Lopes,et al.  Fast Pattern Classification of Ventricular Arrhythmias Using Graphics Processing Units , 2009, CIARP.

[23]  Jan Platos,et al.  GPU Based Parallelism for Self-Organizing Map , 2011, IHCI.

[24]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.