Kronos: a workflow assembler for genome analytics and informatics

Background The field of next generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into ‘best practices’ for automated analysis of NGS datasets) still requires significant programming investment and expertise. Results We present Kronos, a software platform for automating the development and execution of reproducible, auditable and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. The framework of each workflow includes a run manager to execute the encoded workflows locally (or on a cluster or cloud), parallelize tasks, and log all runtime events. Resulting workflows are highly modular and configurable by construction, facilitating flexible and extensible meta-applications which can be modified easily through configuration file editing. The workflows are fully encoded for ease of distribution and can be instantiated on external systems, promoting and facilitating reproducible research and comparative analyses. We introduce a framework for building Kronos components which function as shareable, modular nodes in Kronos workflows. Conclusion The Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large. Kronos is shipped with both Docker and Amazon AWS machine images. It is free, open source and available through PyPI (Python Package Index) and https://github.com/jtaghiyar/kronos.

[1]  Sven Rahmann,et al.  Genome analysis , 2022 .

[2]  Ola Spjuth,et al.  Experiences with workflows for automating data-intensive bioinformatics , 2015, Biology Direct.

[3]  Andrew I. Su,et al.  Omics Pipe: a community-based framework for reproducible multi-omics data analysis , 2015, Bioinform..

[4]  Joshua M. Stuart,et al.  Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection , 2015, Nature Methods.

[5]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[6]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[7]  Bernard J. Pope,et al.  Bpipe: a tool for running and managing bioinformatics pipelines , 2012, Bioinform..

[8]  A. Bashashati,et al.  Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer , 2012, Genome research.

[9]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[10]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[11]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[12]  Leo Goodstadt,et al.  Ruffus: a lightweight Python library for computational pipelines , 2010, Bioinform..

[13]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[14]  Moustafa Ghanem,et al.  Meta-workflows: pattern-based interoperability between Galaxy and Taverna , 2010, Wands '10.

[15]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[16]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .