论文信息 - BioQueue: a novel pipeline framework to accelerate bioinformatics analysis

BioQueue: a novel pipeline framework to accelerate bioinformatics analysis

Motivation With the rapid development of Next‐Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users’ experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow. Results Here, we have developed BioQueue, a web‐based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command‐like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease. Availability and implementation BioQueue is freely available at https://github.com/liyao001/BioQueue. The extensive documentation can be found at http://bioqueue.readthedocs.io. Contact li_yao@outlook.com or gcsui@nefu.edu.cn Supplementary information Supplementary data are available at Bioinformatics online.

Yuanyuan Song | Li Yao | Heming Wang | Guangchao Sui

[1] Jeremy Leipzig,et al. A review of bioinformatic pipeline frameworks , 2016, Briefings Bioinform..

[2] Jeffrey T Leek,et al. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown , 2016, Nature Protocols.

[3] Mathieu Blanchette,et al. BigDataScript: a scripting language for data pipelines , 2014, Bioinform..

[4] Leo Goodstadt,et al. Ruffus: a lightweight Python library for computational pipelines , 2010, Bioinform..

[5] A. Nekrutenko,et al. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.