Building biomedical pipelines for large-scale sequencing analysis based on Galaxy and Cloud

With the widespread adoption of increasing and high-throughput sequencing data, the need for easy access to biomedical analysis tools, efficient data sharing and retrieval has presented significant challenges. Galaxy helps to address this problem by providing an open, Web-based platform for performing accessible and reproducible genomic analysis. To meet the needs for variable computing and storage resources, this paper deploys Galaxy on Cloud infrastructure for on-demand resources allocation, auto-scaling and pay-as-you-go pricing. We further extend Galaxy by complementing user-specific analysis functions, providing reliable and high-performance data transfer capabilities, and realizing Cloud-based distributed computing for Galaxy jobs. A biomedical pipeline and performance evaluation are presented to validate the effectiveness of our proposed approach.