论文信息 - Mapping unstructured mesh codes onto local memory parallel architectures

Mapping unstructured mesh codes onto local memory parallel architectures

Initial work on mapping CFD codes onto parallel systems focused upon software which employed structured meshes. Increasingly, many large scale CFD codes are being based upon unstructured meshes. One of the key problems when implementing such large scale unstructured problems on a distributed memory machine is the question of how to partition the underlying computational domain efficiently. It is important that all processors are kept busy for as large a proportion of the time as possible and that the amount, level and frequency of communication should be kept to a minimum. Proposed techniques for solving the mapping problem have separated out the solution into two distinct phases. The first phase is to partition the computational domain into cohesive sub-regions. The second phase consists of embedding these sub-regions onto the processors. However, it has been shown that performing these two operations in isolation can lead to poor mappings and much less optimal communication time. In this thesis we develop a technique which simultaneously takes account of the processor topology whilst identifying the cohesive sub-regions. Our approach is based on an unstructured mesh decomposition method that was originally developed by Sadayappan et al [SER90] for a hypercube. This technique forms a basis for a method which enables a decomposition to an arbitrary number of processors on a specified processor network topology. Whilst partitioning the mesh, the optimisation method takes into account the processor topology by minimising the total interprocessor communication. The problem with this technique is that it is not suitable for dealing with very large meshes since the calculations often require prodigious amounts of computing processing power. The problem can be overcome by creating clusters of the original elements and using this to create a reduced network which is homomorphic to the original mesh. The technique can now be applied to the image network with comparative ease. The clusters are created using an efficient graph bisection method. The coarseness of the reduced mesh inevitably leads to a degradation of the solution. However, it is possible to refine the resultant partition to recapture some of the richness of the original mesh and hence achieve reasonable partitions. One of the issues to be addressed is the level of granuality to obtain the best balance between computational efficiency and optimality of the solution. Some progress has been made in trying to find an answer to this important issue. In this thesis, we show how the above technique can be effectively utilised in large scale computations. Results include testing the above technique on large scale meshes for complex flow domains.

Beryl Wyn Jones | B. W. Jones

[1] D. R. Fulkerson,et al. Flows in Networks. , 1964 .

[2] G. G. Alway,et al. An algorithm for reducing the bandwidth of a matrix of symmetrical configuration , 1965, Comput. J..

[3] Richard Rosen. Matrix bandwidth minimization , 1968, ACM National Conference.

[4] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[5] Robin J. Wilson. Introduction to Graph Theory , 1974 .

[6] Richard M. Karp,et al. Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[7] M. Fiedler. Algebraic connectivity of graphs , 1973 .

[8] Alex Pothen,et al. PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[9] J. A. Bondy,et al. Graph Theory with Applications , 1978 .

[10] Harold S. Stone,et al. Multiprocessor Scheduling with the Aid of Network Flow Algorithms , 1977, IEEE Transactions on Software Engineering.

[11] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .