Optimisation For Vector And Risc Processors

Single node performance is a key issue in the optim isation of codes for massively parallel processors, especially for applications like ocean and atmospheric modelling where high parallel efficiency is easily obtained from the nat ur l data locality. In this paper we demonstrate how specific optimisations to suit a parti cular target processor architecture can give significant performance benefits for a shallow sea model running on a range of high performance systems. A benchmark performance compar ison between code optimised for vector processors and for scalar, cache-based proce ss rs shows a performance gain of up to a factor of 8.3 between the better and worse per forming code. We also highlight the problems of maintaining a portable code that will p erform well on both scalar and vector processors.