Visualization of large multidimensional data sets by using multi-core CPU, GPU and MPI cluster

Multidimensional scaling (MDS) is a very popular and reliable method used in feature extraction and visualization of multidimensional data. The role of MDS is to reconstruct the topology of an original N-dimensional feature space consisting of M feature vectors in target 2-D (3-D) Euclidean space. It can be achieved by minimization of the error “stress” function F(||D-d||), where D and d are the MxM dissimilarity matrices in the original and in the target spaces, respectively. However, the stress function is in general a multimodal and multidimensional function for which the complexity of finding global minimum increases exponentially with the number of data. We employ here a robust heuristics based on discrete particle method enabling interactive visualization of data for various types of stress functions. Nevertheless, due to at least O(M) memory and time complexity, the method becomes computationally demanding when applied for interactive visualization of data sets involving M~10. We present here efficient parallel algorithms developed for various small and pre-medium computer architectures from single and multi-core processors to GPU and multiprocessor MPI clusters. The timings obtained show that the computational efficiency of CUDA implementation of MDS on a PC equipped with a strong GPU board (Tesla M2050 or GeForce 480) is two times greater than its MPI equivalent run on 10 nodes (10x 2xIntel Xeon X5670 = 120 threads) of a professional multiprocessor cluster (HP SL390). We show also that the hybridized two-level MPI/CUDA implementation run on a small cluster of GPU nodes can additionally provide a linear speed-up.

[1]  Witold Dzwinel,et al.  How to make sammon's mapping useful for multidimensional data structures analysis , 1994, Pattern Recognit..

[2]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[3]  Matthew Chalmers,et al.  A linear iteration time layout algorithm for visualising high-dimensional data , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[4]  Witold Dzwinel,et al.  Procrustes Analysis of Truncated Least Squares Multidimensional Scaling , 2013 .

[5]  Marc Strickert,et al.  Correlation-maximizing surrogate gene space for visual mining of gene expression patterns in developing barley endosperm tissue , 2007, BMC Bioinformatics.

[6]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[7]  Patrick J. F. Groenen,et al.  Metric and Nonmetric MDS , 1997 .

[8]  Heinrich Niemann,et al.  Linear and nonlinear mapping of patterns , 1980, Pattern Recognit..

[9]  Laurent Younes,et al.  Computable Elastic Distances Between Shapes , 1998, SIAM J. Appl. Math..

[10]  R. Mathar,et al.  On global optimization in two-dimensional scaling , 1993 .

[11]  G. Fox,et al.  High Performance Multidimensional Scaling for Large High-Dimensional Data Visualization , 2012 .

[12]  D. C. Rapaport,et al.  The Art of Molecular Dynamics Simulation , 1997 .

[13]  Thomas Ertl,et al.  Implementing FastMap on the GPU: Considerations on General-Purpose Computation on Graphics Hardware , 2005, TPCG.

[14]  Valerie Taylor,et al.  Parallel molecular dynamics: communication requirements for massively parallel machines , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[15]  Min Chen,et al.  Optimization techniques for parallel force-decomposition algorithm in molecular dynamic simulations , 2003 .

[16]  Joachim M. Buhmann,et al.  Data visualization by multidimensional scaling: a deterministic annealing approach , 1996, Pattern Recognit..

[17]  Witold Dzwinel,et al.  Virtual particles and search for global minimum , 1997, Future Gener. Comput. Syst..

[18]  J. Douglas Carroll,et al.  Two-Way Multidimensional Scaling: A Review , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Marc Olano,et al.  Glimmer: Multilevel MDS on the GPU , 2009, IEEE Transactions on Visualization and Computer Graphics.

[20]  Witold Dzwinel,et al.  Parallel Implementation of Multidimensional Scaling Algorithm Based on Particle Dynamics , 2009, PPAM.

[21]  Marc Strickert,et al.  High-Throughput Multi-dimensional Scaling (HiT-MDS) for cDNA-Array Expression Data , 2005, ICANN.

[22]  Krzysztof J. Cios,et al.  Visualization of highly-dimensional data in 3D space , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[23]  Witold Dzwinel,et al.  Method of particles in visual clustering of multi-dimensional and large data sets , 1999, Future Gener. Comput. Syst..

[24]  Stefan Brode,et al.  A new rigid motion algorithm for MD simulations , 1986 .

[25]  Julius Žilinskas,et al.  MULTIDIMENSIONAL SCALING USING PARALLEL GENETIC ALGORITHM , 2006 .

[26]  Witold Dzwinel,et al.  Visual analysis of multidimensional data using fast MDS algorithm , 2007, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[27]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[28]  Marc Strickert,et al.  CUDA-based Multi-core Implementation of MDS-based Bioinformatics Algorithms , 2009, GCB.

[29]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[30]  Stefan Brode,et al.  An optimized MD program for the vector computer cyber 205 , 1986 .

[31]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[32]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[33]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[34]  David A. Yuen,et al.  Nonlinear multidimensional scaling and visualization of earthquake clusters over space, time and feature space , 2005 .

[35]  William Smith,et al.  Parallel macromolecular simulations and the replicated data strategy I. The computation of atomic forces , 1994 .

[36]  Li Yang Distance-preserving mapping of patterns to 3-space , 2004, Pattern Recognit. Lett..

[37]  Geoffrey C. Fox,et al.  Adaptive Interpolation of Multidimensional Scaling , 2012, ICCS.

[38]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[39]  Alexander M. Bronstein,et al.  Multigrid multidimensional scaling , 2006, Numer. Linear Algebra Appl..

[40]  Patrick Mair,et al.  Multidimensional Scaling Using Majorization: SMACOF in R , 2008 .

[41]  William Gropp,et al.  MPICH2: A New Start for MPI Implementations , 2002, PVM/MPI.

[42]  C. Coombs A theory of data. , 1965, Psychology Review.