The future of legume genetic data resources: Challenges, opportunities, and priorities

Legumes, comprising one of the largest, most diverse, and most economically important plant families, are the subject of vibrant research and development worldwide. Continued improvement of legume crops will benefit from the recent proliferation of genetic (including genomic) resources; but the diversity, scale, and complexity of these resources presents challenges to those managing and using them. A workshop held in March of 2019 addressed questions of data resources and priorities for the legumes. The workshop identified various needs and recommendations: (a) Develop strategies to effectively store, integrate, and relate genetic resources collected in different projects. (b) Leverage information collected across many legume species by standardizing data formats and ontologies, improving the state of metadata about datasets, and increasing use of the FAIR data principles. (c) Advocate for the critical role that curators exercise in integrating complex datasets into databases and adding high value metadata that enable downstream analytics and facilitate practical applications. (d) Implement standardized software and database development practices to best leverage limited developer time and expertise gained from the various legume (and other) species. (e) Develop tools and databases that can manage genetic information for the world's plant genetic resources, enabling efficient incorporation of important traits into breeding programs. (f) Centralize information on databases, tools, and training materials and establish funding streams to support training and outreach.

[1]  Rajeev K. Varshney,et al.  Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication , 2015, Proceedings of the National Academy of Sciences.

[2]  Shuang-shuang Qin,et al.  A draft genome for Spatholobus suberectus , 2019, Scientific Data.

[3]  W. Scheible,et al.  Genome-Wide Identification of Medicago Peptides Involved in Macronutrient Responses and Nodulation , 2018 .

[4]  Claire Yik-Lok Chung,et al.  A reference-grade wild soybean genome , 2019, Nature Communications.

[5]  K. Sjölander,et al.  Taking the first steps towards a standard for reporting on phylogenies: Minimum Information About a Phylogenetic Analysis (MIAPA). , 2006, Omics : a journal of integrative biology.

[6]  Lukas A. Mueller,et al.  solGS: a web-based tool for genomic selection , 2014, BMC Bioinformatics.

[7]  James K. Hane,et al.  Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement , 2013, Nature Biotechnology.

[8]  Juliane C. Dohm,et al.  Genome and transcriptome analysis of the Mesoamerican common bean and the role of gene duplications in establishing tissue and temporal specialization of genes , 2016, Genome Biology.

[9]  Y. van de Peer,et al.  Dissecting Plant Genomes with the PLAZA Comparative Genomics Platform1[W] , 2011, Plant Physiology.

[10]  Shelby L. Bidwell,et al.  An improved genome release (version Mt4.0) for the model legume Medicago truncatula , 2014, BMC Genomics.

[11]  P. Cregan,et al.  Fingerprinting Soybean Germplasm and Its Utility in Genomic Research , 2015, G3: Genes, Genomes, Genetics.

[12]  Pierre Larmande,et al.  Gigwa—Genotype investigator for genome-wide analyses , 2016, GigaScience.

[13]  Jing Liu,et al.  De novo assembly of a Chinese soybean genome , 2018, Science China Life Sciences.

[14]  Steven L Salzberg,et al.  Next-generation genome annotation: we still struggle to get it right , 2019, Genome Biology.

[15]  Rod A Wing,et al.  A reference genome for common bean and genome-wide analysis of dual domestications , 2014, Nature Genetics.

[16]  Huanming Yang,et al.  Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers , 2011, Nature Biotechnology.

[17]  Rachel S. Meyer,et al.  Patterns and processes in crop domestication: an historical review and quantitative analysis of 203 global food crops. , 2012, The New phytologist.

[18]  H. Liu,et al.  Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil Improvement. , 2019, Molecular plant.

[19]  P. Wincker,et al.  A reference genome for pea provides insight into legume genome evolution , 2019, Nature Genetics.

[20]  Justin N. Vaughn,et al.  Development of the Alfalfa Breeder's Toolbox: Integration of Genomic, Genetic and Germplasm Resources for Alfalfa Improvement , 2018 .

[21]  Robert M. Waterhouse,et al.  BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics , 2017, bioRxiv.

[22]  Sergio Contrino,et al.  InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data , 2012, Bioinform..

[23]  Hank C Wu,et al.  Small cysteine-rich peptides resembling antimicrobial peptides have been under-predicted in plants. , 2007, The Plant journal : for cell and molecular biology.

[24]  J. Schmutz,et al.  Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome , 2010, Proceedings of the National Academy of Sciences.

[25]  Dave Kudrna,et al.  Red clover (Trifolium pratense L.) draft genome provides a platform for trait improvement , 2015, Scientific Reports.

[26]  Rajeev K. Varshney,et al.  Genome sequence of mungbean and insights into evolution within Vigna species , 2014, Nature Communications.

[27]  Nevin D. Young,et al.  Legumes as a Model Plant Family. Genomics for Food and Feed Report of the Cross-Legume Advances through Genomics Conference1 , 2005, Plant Physiology.

[28]  S. Shu,et al.  The genome of cowpea (Vigna unguiculata [L.] Walp.) , 2019, bioRxiv.

[29]  D. K. Willis,et al.  Copy Number Variation of Multiple Genes at Rhg1 Mediates Nematode Resistance in Soybean , 2012, Science.

[30]  Erez Lieberman Aiden,et al.  The genome sequence of segmental allotetraploid peanut Arachis hypogaea , 2019, Nature Genetics.

[31]  Kenneth L. McNally,et al.  An imputation platform to enhance integration of rice genetic resources , 2018, Nature Communications.

[32]  H. Mori,et al.  Genome Structure of the Legume, Lotus japonicus , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[33]  Sergio Contrino,et al.  InterMine: extensive web services for modern biology , 2014, Nucleic Acids Res..

[34]  Yves Van de Peer,et al.  The draft genomes of five agriculturally important African orphan crops , 2018, GigaScience.

[35]  Peng Zhou,et al.  Detecting small plant peptides using SPADA (Small Peptide Alignment Discovery Application) , 2013, BMC Bioinformatics.

[36]  Matthias Lange,et al.  Genebank genomics highlights the diversity of a global barley collection , 2018, Nature Genetics.

[37]  C. N. Stewart,et al.  Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes. , 2015, Molecular biology and evolution.

[38]  Ruiqiang Li,et al.  De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits , 2014, Nature Biotechnology.

[39]  A. Paterson,et al.  Hierarchically Aligning 10 Legume Genomes Establishes a Family-Level Genomics Platform1[OPEN] , 2017, Plant Physiology.

[40]  Bernhard Schölkopf,et al.  easyGWAS: A Cloud-Based Platform for Comparing the Results of Genome-Wide Association Studies[OPEN] , 2016, Plant Cell.

[41]  Uwe Scholz,et al.  BrAPI—an application programming interface for plant breeding applications , 2019, Bioinform..

[42]  Sanwen Huang,et al.  Meta-analysis of genome-wide association studies provides insights into genetic control of tomato flavor , 2019, Nature communications.

[43]  The Computational Pan-Genomics Consortium,et al.  Computational pan-genomics: status, promises and challenges , 2018, Briefings Bioinform..

[44]  Michael K. Udvardi,et al.  Genome-Wide Identification of Medicago Peptides Involved in Macronutrient Responses and Nodulation1[OPEN] , 2017, Plant Physiology.

[45]  S. Cannon,et al.  Reconstruction of ancestral genome reveals chromosome evolution history for selected legume species. , 2019, The New phytologist.

[46]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[47]  S. Cannon,et al.  Cercis: A Non-polyploid Genomic Relic Within the Generally Polyploid Legume Family , 2019, Front. Plant Sci..

[48]  Huanming Yang,et al.  Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis , 2018, Science.

[49]  Karsten M. Borgwardt,et al.  The AraGWAS Catalog: a curated and standardized Arabidopsis thaliana GWAS catalog , 2017, Nucleic Acids Res..

[50]  J. Miller,et al.  Exploring structural variation and gene family architecture with De Novo assemblies of 15 Medicago genomes , 2017, BMC Genomics.

[51]  Kari Stefansson,et al.  Graphtyper enables population-scale genotyping using pangenome graphs , 2017, Nature Genetics.

[52]  Paul D. Shaw,et al.  Flapjack—graphical genotype visualization , 2010, Bioinform..

[53]  S. Isobe,et al.  Draft genome sequence of subterranean clover, a reference for genus Trifolium , 2016, Scientific Reports.

[54]  Michelle C. Stitzer,et al.  Adaptation in plant genomes: Bigger is different. , 2018, American journal of botany.

[55]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[56]  Alvaro J. González,et al.  The Medicago Genome Provides Insight into the Evolution of Rhizobial Symbioses , 2011, Nature.

[57]  S. Banniza,et al.  Rapid generation cycling of an F2 population derived from a cross between Lens culinaris Medik. and Lens ervoides (Brign.) Grande after aphanomyces root rot selection , 2018, Plant Breeding.

[58]  Claire Yik-Lok Chung,et al.  Construction and comparison of three reference-quality genome assemblies for soybean. , 2019, The Plant journal : for cell and molecular biology.

[59]  H. Kang,et al.  Single-Cell RNA Sequencing Resolves Molecular Relationships Among Individual Plant Cells1[OPEN] , 2019, Plant Physiology.

[60]  Andrew D. Farmer,et al.  Genome Context Viewer: visual exploration of multiple annotated genomes using microsynteny , 2017, Bioinform..

[61]  S. Ghimire,et al.  The Mycorrhizal Fungus, Sebacina vermifera, Enhances Seed Germination and Biomass Production in Switchgrass (Panicum virgatum L) , 2009, BioEnergy Research.

[62]  Mukesh Jain,et al.  An advanced draft genome assembly of a desi type chickpea (Cicer arietinum L.) , 2015, Scientific Reports.

[63]  K. Nieselt,et al.  Spatiotemporal Developmental Trajectories in the Arabidopsis Root Revealed Using High-Throughput Single-Cell RNA Sequencing. , 2019, Developmental cell.

[64]  S. Shu,et al.  The genome of cowpea (Vigna unguiculata [L.] Walp.). , 2019, The Plant journal : for cell and molecular biology.

[65]  S. Fields,et al.  Dynamics of Gene Expression in Single Root Cells of Arabidopsis thaliana. , 2019, The Plant cell.

[66]  Rajeev K. Varshney,et al.  Draft genome sequence of adzuki bean, Vigna angularis , 2015, Scientific Reports.

[67]  Xingtan Zhang,et al.  The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication , 2019, Nature Genetics.

[68]  J. Gouzy,et al.  Whole-genome landscape of Medicago truncatula symbiotic genes , 2018, Nature Plants.

[69]  J. Eisen,et al.  Research priorities for harnessing plant microbiomes in sustainable agriculture , 2017, PLoS biology.

[70]  Rafael Barbosa Pinto,et al.  A new subfamily classification of the leguminosae based on a taxonomically comprehensive phylogeny , 2017 .

[71]  J. Joets,et al.  OptiMAS: A Decision Support Tool for Marker-Assisted Assembly of Diverse Alleles , 2013, The Journal of heredity.

[72]  S. Cannon,et al.  Legumes as a model plant family , 2011 .

[73]  Ying Zhang,et al.  Computational pan-genomics: status, promises and challenges , 2016, bioRxiv.

[74]  H. Steinberg,et al.  Spectroscopy of bulk and few-layer superconducting NbSe2 with van der Waals tunnel junctions , 2017, Nature Communications.

[75]  Uwe Scholz,et al.  Measures for interoperability of phenotypic data: minimum information requirements and formatting , 2016, Plant Methods.

[76]  Lukas A. Mueller,et al.  The Sol Genomics Network (SGN)—from genotype to phenotype to breeding , 2014, Nucleic Acids Res..

[77]  H. Kanamori,et al.  The Glycine max cv. Enrei Genome for Improvement of Japanese Soybean Cultivars , 2015, International journal of genomics.

[78]  K. Vandepoele,et al.  Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences[OPEN] , 2016, Plant Cell.

[79]  G. Hartman,et al.  Assembly and annotation of a draft genome sequence for Glycine latifolia, a perennial wild relative of soybean. , 2018, The Plant journal : for cell and molecular biology.

[80]  Y. Li,et al.  Legume Crops Phylogeny and Genetic Diversity for Science and Breeding , 2015 .

[81]  Z. Fei,et al.  The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor , 2019, Nature Genetics.

[82]  M. Purugganan,et al.  Copy Number Variation in Domestication. , 2019, Trends in plant science.

[83]  Barry Smith,et al.  The Plant Ontology Facilitates Comparisons of Plant Development Stages Across Species , 2019, Front. Plant Sci..

[84]  Stephen P. Ficklin,et al.  Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases , 2013, Database J. Biol. Databases Curation.

[85]  Melanie Kappelmann-Fenzl Reference Genome , 2021, Next Generation Sequencing and Data Analysis.

[86]  Rachael L. Ashby,et al.  Breaking Free: The Genomics of Allopolyploidy-Facilitated Niche Expansion in White Clover. , 2019, The Plant cell.

[87]  Stephen P. Ficklin,et al.  AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture , 2018, Database J. Biol. Databases Curation.

[88]  Wei Huang,et al.  The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut , 2016, Nature Genetics.

[89]  K. Sahu,et al.  Draft genome sequence of Cicer reticulatum L., the wild progenitor of chickpea provides a resource for agronomic trait improvement , 2016, DNA research : an international journal for rapid publication of reports on genes and genomes.

[90]  Karsten M. Borgwardt,et al.  AraPheno: a public database for Arabidopsis thaliana phenotypes , 2016, Nucleic Acids Res..

[91]  Matthias Lange,et al.  Towards recommendations for metadata and data handling in plant phenotyping. , 2015, Journal of experimental botany.

[92]  Jesse Poland,et al.  Field Book: An Open‐Source Application for Field Data Collection on Android , 2014 .

[93]  Hiroaki Sakai,et al.  The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome , 2015, Scientific Reports.

[94]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[95]  N. Morrison,et al.  Multifunctional crop trait ontology for breeders' data: field book, annotation, data discovery and semantic enrichment of the literature , 2010, AoB PLANTS.

[96]  S. Udupa,et al.  Global-level population genomics reveals differential effects of geography and phylogeny on horizontal gene transfer in soil bacteria , 2019, Proceedings of the National Academy of Sciences.

[97]  P. L. Chang,et al.  Ecology and genomics of an important crop wild relative as a prelude to agricultural innovation , 2018, Nature Communications.

[98]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[99]  James K. Hane,et al.  A comprehensive draft genome sequence for lupin (Lupinus angustifolius), an emerging health food: insights into plant–microbe interactions and legume evolution , 2016, Plant biotechnology journal.

[100]  Trevor W. Rife Field Book: An Open-Source Application for Phenotypic Data Collection on Android , 2014 .

[101]  E. V. von Wettberg,et al.  The Impact of Genetic Changes during Crop Domestication , 2018, Agronomy.