Parallel hybrid algorithm for global optimization of problems occurring in MDS-based visualization

K e y w o r d s M u l t i d i m e n s i o n a l sealing, Metaheuristics, Global optimization, Parallel algorithms. 1. I N T R O D U C T I O N Multidimensional scaling (MDS) is an exploratory technique for data analysis [1-3]. A set of n ~bjects is considered assuming that pairwise dissimilarities of the objects are given by the matrix (6ii); depending on applications dissimilarities are determined by different empirical or computational methods, e.g., dissimilarities between several brands of cars can be evaluated comparing their technical characteristics, dissimilarities between tastes of several wines can be measured by subjective evaluations of a group of testers, etc. The m-dimensional image of the objects is a set of points in the embedding m-dimensional space xi = (x~l, . . . ,x~m), i -1 , . . . ,n. A set of points in the embedding space is sought whose interpoint distances fit the given dissimilarities 6~j; we assume that 6ji > 0, and 6ij = 5ji. It is well known that under mild conditions there exist points in ( n 1)-dimensional space whose pairwise distances are equal to dissimilarities; see, e.g., [2]. Therefore original data can be considered a set of points in a multidimensional vector space. We want to find an image of such a set in a low-dimensional embedding space. Since MDS is frequently used to visualize sets of points of multidimensional vector spaces, the *Author to whom all correspondence should be addressed. The authors acknowledge the support of Lithuanian State Science and Studies Foundation. The work of the second author is supported by the NATO Reintegration Grant CBP.EAP.RIG.981300 and the HPC-Europa programme, funded under the European Commission's Research Infrastructures activity of the Structuring the European Research Area programme, contract number RII3-CT-2003-506079. 9898-1221/06/$ see front matter (~) 2006 Elsevier Ltd. All rights reserved. Typeset by .AA, I~TEX doi:10.1016/j.camwa.2006.08.016 212 A. ZILINSKAS AND J . ZILINSKAS two-dimensional embedding space (m = 2) is of special interest. The implementation of an MDS method is reduced to minimization of a fitness criterion, e.g., the so-called S T R E S S function n S ( X ) = E wij (d i j (X) 5ij) 2, (1) i<j where X = ( ( x n , . . . , x n l ) , . . . , (X lm, . . . ,X~m)) ; it is supposed that the weights are positive: wij > O, i , j = 1 , . . . , n ; d i j (X) denotes the distance between the points xi and xj. A norm in R m should be chosen for calculation of distances. Most often a Minkowski distance is used ,~ i/~ d~j(X) = k=i~-~ ] I x i k x J k l r / " (2) Formula (2) defines the Euclidean distances when r = 2, and the city block distances when r = 1. The points xi defined by means of minimization of (1) but using different distances in the embedding space can be interpreted as different nonlinear projections of the objects from the original to the embedding space. MDS is a difficult global optimization problem. Although S T R E S S is defined by an analytical formula, which seems rather simple, its minimization is difficult. The function normally has many local minima. The minimization problem is high-dimensional: number of variables is N = n x m. Nondifferentiability of S T R E S S normally cannot be ignored. On the other hand, MDS-based visualization is useful in development of interactive global optimization methods [4]. The systematic application of global optimization in MDS has been initiated in [5], where majorization-based local search has been combined with the globalization of the descent by means of tunnelling. Further publications of different authors show great diversity of properties of MDS-related global optimization problems: depending on data, different algorithms have been shown more or less appropriate to solve MDS problems. For example, the advantages of genetic algorithms in MDS with nonstandard stress (fitness) criteria are discussed in [6]. The testing results in [7,8] prove that the hybrid algorithm combining evolutionary global search with effÉcient local descent is the most reliable though the most time-consuming method for MDS with Euclidean distances; the used test data well represent the typical visualization problems. In the present paper we investigate efficiency/reliability of a combination of evolutionary global search with local descent in MDS applications to visualize multidimensional data. Sets of points in multidimensional vector spaces are visualized in the two-dimensional embedding space with Euclidean and city block norm. Parallelization of the algorithm is applied to cope with the large solution time of usual uniprocessor implementations. 2. B A S I C V E R S I O N O F H Y B R I D A L G O R I T H M For the minimization of (1) a hybrid algorithm has been implemented combining a genetic algorithm (similar to that used in [9]) at upper level, and local minimization algorithm at lower level. The pseudocode of the algorithm is outlined below. The upper-level genetic algorithm ensures globality of search. Local descent at lower level ensures efficient search for local minima. From the point of view of evolutionary optimization the algorithm consists of the following 'genetic operators': random (with uniform distribution) selection of parents, two-point crossover, adaptation to environment (modelled by local minimization), and elitist survival. Interpreting the vector of variables in (1) as chromosome the crossover operator is defined by the following formula: X = arg_min_from ((tBll, . . . , X~l 1, x~l+l 1 , . . . , x~2-11,2~21,.. • , tBnl) , Parallel Hybrid Algorithm 213 where X is the chromosome of the offspring; )( and ~ /a re chromosomes of the selected parents; ~1, ~2 are two integer random numbers with uniform distribution over 1 , . . . , n; and it is supposed that the parent X is better fitted than the parent J( with respect to the value of STRESS. arg_min_from(Z) denotes an operator of calculation of the local minimizer of (1) from the starting point Z. Outline of the pseudocode of the algorithm is presented below. The idea is to maintain a population of best (with respect to STRESS value) solutions whose crossover can generate better solutions. The size of population p is a parameter of the algorithm. An initial population is generated performing local search from p starting points that are best (with respect to the STRESS value) from a sample of Ninit randomly generated points. The population evolves generating offsprings. Minimization terminates after predetermined computing time to. T h e S t r u c t u r e of t he H y b r i d A l g o r i t h m wi th P a r a m e t e r s (p, ginit , tc) Generate the initial population: Generate Ninit uniformly distributed random points. Perform search for local minima starting from the best p generated points. Form the initial population from the found local minimizers. while tc time has not passed Randomly with uniform distribution select two parents from a current population. Produce an offspring by means of crossover and local minimization. If the offspring is more fitted than the worst individual of the current population, then the offspring replaces the latter. 3. E X P E R I M E N T A L I N V E S T I G A T I O N Theoretical assessment of an optimization algorithm with respect of its efficiency in solution of MDS problems is difficult. For example, a local minimization subroutine could be evaluated according to the local convergence rate widely used in optimization theory. But this criterion represents only one of the efficiency aspects, and it is not sufficient to assess the general efficiency of global optimization. Therefore in this paper we investigate the efficiency of the proposed algorithm experimentally. To obtain results reproducible by the other researchers we use local minimization subroutines from an easily accessible library [10]. A Sun Fire E15k computer is used for experimental investigation.