A large set of 26 new reference transcriptomes dedicated to comparative population genomics in crops and wild relatives

We produced a unique large data set of reference transcriptomes to obtain new knowledge about the evolution of plant genomes and crop domestication. For this purpose, we validated a RNA‐Seq data assembly protocol to perform comparative population genomics. For the validation, we assessed and compared the quality of de novo Illumina short‐read assemblies using data from two crops for which an annotated reference genome was available, namely grapevine and sorghum. We used the same protocol for the release of 26 new transcriptomes of crop plants and wild relatives, including still understudied crops such as yam, pearl millet and fonio. The species list has a wide taxonomic representation with the inclusion of 15 monocots and 11 eudicots. All contigs were annotated using BLAST, prot4EST and Blast2GO. A strong originality of the data set is that each crop is associated with close relative species, which will permit whole‐genome comparative evolutionary studies between crops and their wild‐related species. This large resource will thus serve research communities working on both crops and model organisms. All the data are available at http://arcad-bioinformatics.southgreen.fr/.

[1]  Matthew W. Hahn,et al.  Three New Genome Assemblies Support a Rapid Radiation in Musa acuminata (Wild Banana) , 2018, bioRxiv.

[2]  Hideki Hirakawa,et al.  Draft Genome Sequence of Eggplant (Solanum melongena L.): the Representative Solanum Species Indigenous to the Old World , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[3]  Philippe Bardou,et al.  jvenn: an interactive Venn diagram viewer , 2014, BMC Bioinformatics.

[4]  C. Deng,et al.  Comparative transcriptome analysis of eggplant (Solanum melongena L.) and turkey berry (Solanum torvum Sw.): phylogenomics and disease resistance analysis , 2014, BMC Genomics.

[5]  A. Krogh,et al.  Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization , 2014, Proceedings of the National Academy of Sciences.

[6]  S. Cannon,et al.  Comprehensive Transcriptome Assembly of Chickpea (Cicer arietinum L.) Using Sanger and Next Generation Sequencing Platforms: Development and Applications , 2014, PloS one.

[7]  Yeisoo Yu,et al.  Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species , 2014, Nature Genetics.

[8]  H. Ellegren Genome sequencing and population genomics in non-model organisms. , 2014, Trends in ecology & evolution.

[9]  Jeffrey P. Mower,et al.  Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing , 2013, BMC Plant Biology.

[10]  Sharon R Grossman,et al.  Detecting natural selection in genomic data. , 2013, Annual review of genetics.

[11]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[12]  Anil Kumar Singh,et al.  Comprehensive transcriptomic study on horse gram (Macrotyloma uniflorum): De novo assembly, functional characterization and comparative analysis in relation to drought stress , 2013, BMC Genomics.

[13]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[14]  J. Rodriguez,et al.  De Novo Assembly and Functional Annotation of the Olive (Olea europaea) Transcriptome , 2013, DNA research : an international journal for rapid publication of reports on genes and genomes.

[15]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[16]  Rachel S. Meyer,et al.  Patterns and processes in crop domestication: an historical review and quantitative analysis of 203 global food crops. , 2012, The New phytologist.

[17]  V. Ranwez,et al.  Reference‐free transcriptome assembly in non‐model animals from next‐generation sequencing data , 2012, Molecular ecology resources.

[18]  Xun Xu,et al.  Comparative population genomics of maize domestication and improvement , 2012, Nature Genetics.

[19]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[20]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[21]  H. Fukuoka,et al.  Accumulation, functional annotation, and comparative analysis of expressed sequence tags in eggplant (Solanum melongena L.), the third pole of the genus Solanum species after tomato and potato. , 2010, Gene.

[22]  Thomas Bataillon,et al.  A comparative view of the evolution of grasses under domestication. , 2009, The New phytologist.

[23]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[24]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[25]  Mihaela M. Martis,et al.  The Sorghum bicolor genome and the diversification of grasses , 2009, Nature.

[26]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[27]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[28]  B. Gaut,et al.  Plant domestication, a unique opportunity to identify the genetic basis of adaptation , 2007, Proceedings of the National Academy of Sciences.

[29]  J. Tregear,et al.  Somaclonal variation in micropropagated oil palm. Characterization of two novel genes with enhanced expression in epigenetically abnormal cell lines and in response to auxin. , 2006, Tree physiology.

[30]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[31]  Mark L. Blaxter,et al.  prot4EST: Translating Expressed Sequence Tags from neglected genomes , 2004, BMC Bioinformatics.

[32]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.