Parallelization of nonuniform loops in supercomputers with distributed memory
暂无分享,去创建一个
A template algorithm for parallel execution of independent iterations of the repetitive loop on a multiprocessor computer with distributed memory is constructed. Regardless of the number of processors, the algorithm must provide efficient utilization of computing capacity under essentially different complexities of iterations and/or performance of processors. The interprocessor data communication and control of parallel computations are assumed to be implemented using a standard message-passing interface (MPI), which is widely used in such systems. Existing methods for the loop parallelization are analyzed and the corresponding efficiencies are empirically estimated for various models of iteration nonuniformity.
[1] V. Lyubetsky,et al. Modeling RNA polymerase interaction in mitochondria of chordates , 2012, Biology Direct.
[2] V. Lyubetsky,et al. Lack of conservation of bacterial type promoters in plastids of Streptophyta , 2010, Biology Direct.
[3] V. Lyubetsky,et al. Modeling RNA polymerase competition: the effect of σ-subunit knockout and heat shock on gene transcription level , 2011, Biology Direct.
[4] L. Y. Rusin,et al. Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios , 2012, Biology Direct.