Breedbase: a digital ecosystem for modern plant breeding

Abstract Modern breeding methods integrate next-generation sequencing and phenomics to identify plants with the best characteristics and greatest genetic merit for use as parents in subsequent breeding cycles to ultimately create improved cultivars able to sustain high adoption rates by farmers. This data-driven approach hinges on strong foundations in data management, quality control, and analytics. Of crucial importance is a central database able to (1) track breeding materials, (2) store experimental evaluations, (3) record phenotypic measurements using consistent ontologies, (4) store genotypic information, and (5) implement algorithms for analysis, prediction, and selection decisions. Because of the complexity of the breeding process, breeding databases also tend to be complex, difficult, and expensive to implement and maintain. Here, we present a breeding database system, Breedbase (https://breedbase.org/, last accessed 4/18/2022). Originally initiated as Cassavabase (https://cassavabase.org/, last accessed 4/18/2022) with the NextGen Cassava project (https://www.nextgencassava.org/, last accessed 4/18/2022), and later developed into a crop-agnostic system, it is presently used by dozens of different crops and projects. The system is web based and is available as open source software. It is available on GitHub (https://github.com/solgenomics/, last accessed 4/18/2022) and packaged in a Docker image for deployment (https://hub.docker.com/u/breedbase, last accessed 4/18/2022). The Breedbase system enables breeding programs to better manage and leverage their data for decision making within a fully integrated digital ecosystem.

[1]  G. Volk,et al.  Integrating Genomic and Phenomic Approaches to Support Plant Genetic Resources Conservation and Use , 2021, Plants.

[2]  Nicolas Morales,et al.  High density genotype storage for plant breeding in the Chado schema of Breedbase , 2020, PloS one.

[3]  Sabina Leonelli,et al.  The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems , 2020, Patterns.

[4]  Michael A. Gore,et al.  Making WAVES in Breedbase: An Integrated Spectral Data Storage and Analysis Pipeline for Plant Breeding Programs , 2020, bioRxiv.

[5]  Makenzie E. Mabry,et al.  Independent evolution of ancestral and novel defenses in a genus of toxic plants (Erysimum, Brassicaceae) , 2020, eLife.

[6]  Marco Brandizi,et al.  KnetMiner: a comprehensive approach for supporting evidence‐based gene discovery and complex trait analysis across species , 2020, bioRxiv.

[7]  Nicholas Kaczmar,et al.  ImageBreed: Open‐access plant breeding web–database for image‐based phenotyping , 2020, The Plant Phenome Journal.

[8]  Uwe Scholz,et al.  BrAPI—an application programming interface for plant breeding applications , 2019, Bioinform..

[9]  J. Cobb,et al.  Enhancing the rate of genetic gain in public-sector plant breeding programs: lessons from the breeder’s equation , 2019, Theoretical and Applied Genetics.

[10]  Michael S. Barker,et al.  Fern genomes elucidate land plant evolution and cyanobacterial symbioses , 2018, Nature Plants.

[11]  Brandi L. Cantarel,et al.  Improved annotation of the insect vector of citrus greening disease: biocuration by a diverse genomics community , 2017, bioRxiv.

[12]  Reinhard Simon,et al.  Agricolae - Ten years of an open source statistical tool for experiments in breeding, agriculture and biology , 2015, PeerJ Prepr..

[13]  Lukas A. Mueller,et al.  The Sol Genomics Network (SGN)—from genotype to phenotype to breeding , 2014, Nucleic Acids Res..

[14]  G. Martin,et al.  The SGN VIGS tool: user-friendly software to design virus-induced gene silencing (VIGS) constructs for functional genomics. , 2015, Molecular plant.

[15]  Lukas A. Mueller,et al.  solGS: a web-based tool for genomic selection , 2014, BMC Bioinformatics.

[16]  Michael J. Thomson,et al.  High-Throughput SNP Genotyping to Accelerate Crop Improvement , 2014 .

[17]  Jesse Poland,et al.  Field Book: An Open‐Source Application for Field Data Collection on Android , 2014 .

[18]  S. Hearne,et al.  Single nucleotide polymorphism genotyping using Kompetitive Allele Specific PCR (KASP): overview of the technology and its application in crop improvement , 2013, Molecular Breeding.

[19]  F. Breseghello,et al.  Traditional and modern plant breeding methods with examples in rice (Oryza sativa L.). , 2013, Journal of agricultural and food chemistry.

[20]  Jeffrey W. White,et al.  Development and evaluation of a field-based high-throughput phenotyping platform. , 2013, Functional plant biology : FPB.

[21]  Arllet M. Portugal,et al.  Bridging the phenotypic and genetic data useful for integrated breeding through a data annotation using the Crop Ontology developed by the crop communities of practice , 2012, Front. Physio..

[22]  Daniel W. A. Buchan,et al.  The tomato genome sequence provides insights into fleshy fruit evolution , 2012, Nature.

[23]  A. Kilian,et al.  Diversity arrays technology: a generic genome profiling technology on open platforms. , 2012, Methods in molecular biology.

[24]  Robert M. Buels,et al.  The Chado Natural Diversity module: a new generic database schema for large-scale phenotyping and genotyping data , 2011, Database J. Biol. Databases Curation.

[25]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[26]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[27]  Robert M. Buels,et al.  The Sol Genomics Network (solgenomics.net): growing tomatoes using Perl , 2010, Nucleic Acids Res..

[28]  Robert M. Buels,et al.  solQTL: a tool for QTL analysis, visualization and linking to genomes at SGN database , 2010, BMC Bioinformatics.

[29]  J. Holland,et al.  Estimating and Interpreting Heritability for Plant Breeding: An Update , 2010 .

[30]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[31]  Lukas A. Mueller,et al.  A Community-Based Annotation Framework for Linking Solanaceae Genomes with Phenomes1[C][OA] , 2008, Plant Physiology.

[32]  Robert M. Buels,et al.  The SGN comparative map viewer , 2008, Bioinform..

[33]  Shane Warden,et al.  The art of agile development , 2007 .

[34]  Joe Celko Joe Celko's Analytics and OLAP in SQL (The Morgan Kaufmann Series in Data Management Systems) , 2006 .

[35]  Mark H. Wright,et al.  The SOL Genomics Network. A Comparative Resource for Solanaceae Biology and Beyond1 , 2005, Plant Physiology.

[36]  Lukas A. Mueller,et al.  The Tomato Sequencing Project, the First Cornerstone of the International Solanaceae Project (SOL) , 2005, Comparative and functional genomics.

[37]  Biplot AMMI graphic representation of specific combining ability , 2002 .

[38]  M. Goddard,et al.  Prediction of identity by descent probabilities from marker-haplotypes , 2001, Genetics Selection Evolution.

[39]  D. Hoisington,et al.  Marker-assisted selection: new tools and strategies , 1998 .

[40]  Kent L. Beck,et al.  Extreme programming explained - embrace change , 1990 .