Screening Data for Phylogenetic Analysis of Land Plants: A Parallel Approach

Screening data for phylogenetic analysis from large datasets is a known computational problem of data-intensive application. In this paper, we implement an approach to screen sequence data for The Platform for Phylogenetic Analysis of Land Plants (PALPP), using the MapReduce paradigm to parallelize the Basic Local Alignment Search Tool (BLAST) and to manage its execution, using machine virtualization to encapsulate its execution environment and commonly using data sets into flexibly deployable virtual machines. Two methods of BLAST using Hadoop are implemented and the evaluation of the approach is also presented.