Biology Needs Evolutionary Software Tools: Let’s Build Them Right

Abstract Research in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today’s life sciences ranging from cancer biology to microbial ecology. This situation makes algorithms and software tools developed by our community more important than ever before. This means that we, developers of software tool for molecular evolutionary analyses, now have a shared responsibility to make these tools accessible using modern technological developments as well as provide adequate documentation and training.

[1]  R. Lewontin,et al.  A molecular approach to the study of genic heterozygosity in natural populations. I. The number of alleles at different loci in Drosophila pseudoobscura. , 1966, Genetics.

[2]  R. Lewontin,et al.  A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. , 1966, Genetics.

[3]  R. Lewontin,et al.  A molecular approach to the study of genic heterozygosity in natural populations. IV. Patterns of genic variation in central, marginal and isolated populations of Drosophila pseudoobscura. , 1969, Genetics.

[4]  M. Kreitman,et al.  Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster , 1983, Nature.

[5]  R. Gadagkar Nothing in Biology Makes Sense Except in the Light of Evolution , 2005 .

[6]  P.S. Steif,et al.  Enhancing traditional classroom instruction with web-based Statics course , 2007, 2007 37th Annual Frontiers In Education Conference - Global Engineering: Knowledge Without Borders, Opportunities Without Passports.

[7]  Daniel J. Blankenberg,et al.  A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. , 2007, Genome research.

[8]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[9]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[10]  Greg Wilson,et al.  Software Carpentry: lessons learned , 2013, F1000Research.

[11]  John Chilton,et al.  Enhancing pre-defined workflows with ad hoc analytics using Galaxy, Docker and Jupyter , 2016, bioRxiv.

[12]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[13]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[14]  Brett K. Beaulieu-Jones,et al.  Reproducibility of computational workflows is automated using continuous analysis , 2017, Nature Biotechnology.

[15]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[16]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[17]  Marius van den Beek,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update , 2018, Nucleic Acids Res..

[18]  Daniel Blankenberg,et al.  Software engineering for scientific big data analysis , 2019, GigaScience.