A lightweight, scalable grid computing framework for parallel bioinformatics applications

In recent years our society has witnessed an unprecedented growth in computing power available to tackle important problems in science, engineering and medicine. For example, the SHARCNET network links large computing resources in 11 leading academic institutions in South Central Ontario, thus providing access to thousands of compute processors. It is a continuous challenge to develop efficient and scalable algorithms and methods for solving large scientific and engineering problems on such parallel and distributed computers. If the computing power available in such computational grids can be unleashed effectively in a scalable way, large scientific problems can be solved that would otherwise be hard to solve using the machines available in a stand-alone way. This paper describes techniques and software developed that allow to apply the power of computational grids to large-scale, loosely coupled parallel bioinformatics problems. Our approach is based on decentralization and implemented in Java, leading to a flexible, portable and scalable software solution for parallel bioinformatics. We discuss advantages and disadvantages of this approach, and demonstrate seamless performance on an ad-hoc grid composed of a wide variety of hardware for a real-life parallel bioinformatics problem. The bioinformatics problem described consists of virtual experiments in RNA folding executed on hundreds of compute processors concurrently, which may establish one of the missing links in the chain events that led to the origin of life.