Time complexity analysis for distributed memory computers: implementation of parallel conjugate gradient method

New developments in Computer Science, both hardware and software, offer researchers, such as physicists, unprecedented possibilities to solve their computational intensive problems.However, full exploitation of e.g. new massively parallel computers, parallel languages or runtime environments requires an approach that combines elements of computer science, numerical mathematics and, in our case, physics. We call this b-computer science.Here we present an example of a computational intensive physical application, the coupled dipole formulation of elastic light scattering from arbitrary shaped particles. The computational kernel of this method is a large set of linear equations. We solve this set by means of a Conjugate Gradient (CG) method, implemented on a coarse grain distributed memory computer (a Transputer network).This paper describes the parallelization of a CG method. Two important choices are discussed; what is the best possible decomposition and which processor network topology is most suited. We introduce a general method to answer these questions and investigate its usefulness by applying this method to our application. It is concluded that implementation of the CG method, with a row-block decomposition of the coefficient matrix, on a ring of Transputers is the most efficient choice.Finally the actual implementation is described, and preliminary experimental timing results are compared with the theoretical time complexity analysis.