BGBlast: A BLAST Grid Implementation with Database Self-Updating and Adaptive Replication

BLAST is probably the most used application in bioinformatics teams. BLAST complexity tends to be a concern when the query sequence sets and reference databases are large. Here we present BGBlast: an approach for handling the computational complexity of large BLAST executions by porting BLAST to the Grid platform, leveraging the power of the thousands of CPUs which compose the EGEE infrastructure. BGBlast provides innovative features for efficiently managing BLAST databases in the distributed Grid environment. The system (1) keeps the databases constantly up to date while still allowing the user to regress to earlier versions, (2) stores the older versions of databases on the Grid with a time and space efficient delta encoding and (3) manages the number of replicas for each database over the Grid with an adaptive algorithm, dynamically balancing between execution parallelism and storage costs.