Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world's leading biomass crop

ABSTRACT Background Sugarcane cultivars are polyploid interspecific hybrids of giant genomes, typically with 10–13 sets of chromosomes from 2 Saccharum species. The ploidy, hybridity, and size of the genome, estimated to have >10 Gb, pose a challenge for sequencing. Results Here we present a gene space assembly of SP80-3280, including 373,869 putative genes and their potential regulatory regions. The alignment of single-copy genes in diploid grasses to the putative genes indicates that we could resolve 2–6 (up to 15) putative homo(eo)logs that are 99.1% identical within their coding sequences. Dissimilarities increase in their regulatory regions, and gene promoter analysis shows differences in regulatory elements within gene families that are expressed in a species-specific manner. We exemplify these differences for sucrose synthase (SuSy) and phenylalanine ammonia-lyase (PAL), 2 gene families central to carbon partitioning. SP80-3280 has particular regulatory elements involved in sucrose synthesis not found in the ancestor Saccharum spontaneum. PAL regulatory elements are found in co-expressed genes related to fiber synthesis within gene networks defined during plant growth and maturation. Comparison with sorghum reveals predominantly bi-allelic variations in sugarcane, consistent with the formation of 2 “subgenomes” after their divergence ∼3.8–4.6 million years ago and reveals single-nucleotide variants that may underlie their differences. Conclusions This assembly represents a large step towards a whole-genome assembly of a commercial sugarcane cultivar. It includes a rich diversity of genes and homo(eo)logous resolution for a representative fraction of the gene space, relevant to improve biomass and food production.

[1]  David Sankoff,et al.  Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. , 2018, Nature Genetics.

[2]  N. Provart,et al.  The transcriptional landscape of polyploid wheat , 2018, Science.

[3]  B. Simmons,et al.  A mosaic monoploid reference sequence for the highly complex genome of sugarcane , 2018, Nature Communications.

[4]  C. B. Monteiro-Vitorello,et al.  Revisiting Meiosis in Sugarcane: Chromosomal Irregularities and the Prevalence of Bivalent Configurations , 2018, Front. Genet..

[5]  Nam V. Hoang,et al.  The Challenge of Analyzing the Sugarcane Genome , 2018, Front. Plant Sci..

[6]  Helder I. Nakaya,et al.  CEMiTool: a Bioconductor package for performing comprehensive modular co-expression analyses , 2018, BMC Bioinformatics.

[7]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[8]  S. Long,et al.  Brazilian sugarcane ethanol as an expandable green alternative to crude oil use , 2017 .

[9]  R. M. Filho,et al.  The Role of Bioenergy in a Climate-Changing World , 2017 .

[10]  R. Vicentini,et al.  Ethylene-induced transcriptional and hormonal responses at the onset of sugarcane ripening , 2017, Scientific Reports.

[11]  F. X. Johnson,et al.  Reconciling food security and bioenergy: priorities for action , 2017 .

[12]  Ryan F. McCormick,et al.  The Sorghum bicolor reference genome: improved assembly and annotations, a transcriptome atlas, and signatures of genome organization , 2017, bioRxiv.

[13]  Victor V. Solovyev,et al.  TSSPlant: a new tool for prediction of plant Pol II promoters , 2017, Nucleic acids research.

[14]  M. Vincentz,et al.  Analysis of Three Sugarcane Homo/Homeologous Regions Suggests Independent Polyploidization Events of Saccharum officinarum and Saccharum spontaneum , 2017, Genome biology and evolution.

[15]  D. Riaño-Pachón,et al.  Draft genome sequencing of the sugarcane hybrid SP80-3280. , 2017, F1000Research.

[16]  S. Tangphatsornruang,et al.  The two chromosomes of the mitochondrial genome of a sugarcane cultivar: assembly and recombination analysis using long PacBio reads , 2016, Scientific Reports.

[17]  K. Vandepoele,et al.  Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences[OPEN] , 2016, Plant Cell.

[18]  J. Josse,et al.  missMDA: A Package for Handling Missing Values in Multivariate Data Analysis , 2016 .

[19]  Sudhir Kumar,et al.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. , 2016, Molecular biology and evolution.

[20]  Mingming Xin,et al.  Expression partitioning of homeologs and tandem duplications contribute to salt tolerance in wheat (Triticum aestivum L.) , 2016, Scientific Reports.

[21]  SouzaGlaucia Mendes,et al.  Industrial Biotechnology and Biomass: What Next for Brazil's Future Energy and Chemicals? , 2016 .

[22]  C. T. Hotta,et al.  Co-expression network analysis reveals transcription factors associated to cell wall biosynthesis in sugarcane , 2016, Plant Molecular Biology.

[23]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[24]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[25]  R. Vicentini,et al.  Large-Scale Transcriptome Analysis of Two Sugarcane Genotypes Contrasting for Lignin Content , 2015, PloS one.

[26]  Mingming Xin,et al.  Temporal transcriptome profiling reveals expression partitioning of homeologous genes contributing to heat and drought acclimation in wheat (Triticum aestivum L.) , 2015, BMC Plant Biology.

[27]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[28]  P. Moore,et al.  Feedstocks for Biofuels and Bioenergy , 2015 .

[29]  M. Nishiyama,et al.  Full-Length Enriched cDNA Libraries and ORFeome Analysis of Sugarcane Hybrid and Ancestor Genotypes , 2014, PloS one.

[30]  A. Paterson,et al.  Comparative Analysis of Miscanthus and Saccharum Reveals a Shared Whole-Genome Duplication but Different Evolutionary Fates[W] , 2014, Plant Cell.

[31]  R. Vicentini,et al.  De Novo Assembly and Transcriptome Analysis of Contrasting Sugarcane Varieties , 2014, PloS one.

[32]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[33]  Claudio Benicio Cardoso-Silva,et al.  Building the sugarcane genome for biotechnology and identifying evolutionary trends , 2014, BMC Genomics.

[34]  Lijia Xu,et al.  Identification, characterization, and utilization of single copy genes in 29 angiosperm genomes , 2014, BMC Genomics.

[35]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[36]  Haplotype analysis of sucrose synthase gene family in three Saccharum species , 2013, BMC Genomics.

[37]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[38]  Bartel Vanholme,et al.  A Systems Biology View of Responses to Lignin Biosynthesis Perturbations in Arabidopsis[W] , 2012, Plant Cell.

[39]  Kathleen Marchal,et al.  MotifSuite: workflow for probabilistic motif detection and assessment , 2012, Bioinform..

[40]  C. T. Hotta,et al.  Sugarcane improvement: how far can we go? , 2012, Current opinion in biotechnology.

[41]  David M. Goodstein,et al.  Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..

[42]  R. Vicentini,et al.  Analysis of plant LTR-retrotransposons at the fine-scale family level reveals individual molecular patterns , 2012, BMC Genomics.

[43]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[44]  P. M. Campbell,et al.  A Novel Isoform of Sucrose Synthase Is Targeted to the Cell Wall during Secondary Cell Wall Synthesis in Cotton Fiber[C][W][OA] , 2011, Plant Physiology.

[45]  Xianran Li,et al.  Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum , 2011, BMC Genomics.

[46]  Martin Kollmar,et al.  A novel hybrid gene prediction method employing protein multiple sequence alignments , 2011, Bioinform..

[47]  Chung-Mo Park,et al.  Modulation of sugar metabolism by an INDETERMINATE DOMAIN transcription factor contributes to photoperiodic flowering in Arabidopsis. , 2011, The Plant journal : for cell and molecular biology.

[48]  C. Chapple,et al.  The genetics of lignin biosynthesis: connecting genotype to phenotype. , 2010, Annual review of genetics.

[49]  P. Moore,et al.  Sugarcane for bioenergy production: an assessment of yield and regulation of sucrose content. , 2010, Plant biotechnology journal.

[50]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[51]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[52]  B. Roe,et al.  Microcollinearity between autopolyploid sugarcane and diploid sorghum genomes , 2010, BMC Genomics.

[53]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[54]  Peng Gao,et al.  Comparative genome analysis of lignin biosynthesis gene families across the plant kingdom , 2009, BMC Bioinformatics.

[55]  M. Jones,et al.  REVEILLE1, a Myb-like transcription factor, integrates the circadian clock and auxin pathways , 2009, Proceedings of the National Academy of Sciences.

[56]  Heather D. Coleman,et al.  Sucrose synthase affects carbon partitioning to increase cellulose production and altered cell wall ultrastructure , 2009, Proceedings of the National Academy of Sciences.

[57]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[58]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[59]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[60]  Mihaela M. Martis,et al.  The Sorghum bicolor genome and the diversification of grasses , 2009, Nature.

[61]  M. Vincentz,et al.  Sugarcane genes associated with sucrose content , 2009, BMC Genomics.

[62]  D. Janies,et al.  GRASSIUS: A Platform for Comparative Regulatory Genomics across the Grasses1[W][OA] , 2008, Plant Physiology.

[63]  J. Goldemberg,et al.  The Sustainability of Ethanol Production from Sugarcane , 2008, Renewable Energy.

[64]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[65]  O. Gotoh,et al.  A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence , 2008, Nucleic acids research.

[66]  C. Faleri,et al.  Sucrose Synthase Is Associated with the Cell Wall of Tobacco Pollen Tubes1[W] , 2008, Plant Physiology.

[67]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[68]  K. Adams,et al.  Expression Partitioning between Genes Duplicated by Polyploidy under Abiotic Stress and during Organ Development , 2007, Current Biology.

[69]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[70]  J. Goldemberg Ethanol for a Sustainable Energy Future , 2007, Science.

[71]  David K. Johnson,et al.  Biomass Recalcitrance: Engineering Plants and Enzymes for Biofuels Production , 2007, Science.

[72]  L. Poppe,et al.  The essential tyrosine‐containing loop conformation and the role of the C‐terminal multi‐helix region in eukaryotic phenylalanine ammonia‐lyases , 2006, The FEBS journal.

[73]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[74]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[75]  Mark Borodovsky,et al.  GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses , 2005, Nucleic Acids Res..

[76]  A. D'Hont,et al.  Unraveling the genome structure of polyploids using FISH and GISH; examples of sugarcane and banana , 2005, Cytogenetic and Genome Research.

[77]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[78]  João Paulo Kitajima,et al.  Structural features and transcript-editing analysis of sugarcane (Saccharum officinarum L.) chloroplast genome , 2004, Current Genetics.

[79]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[80]  J. V. Van Beeumen,et al.  Molecular Phenotyping of the pal1 and pal2 Mutants of Arabidopsis thaliana Reveals Far-Reaching Consequences on Phenylpropanoid, Amino Acid, and Carbohydrate Metabolism , 2004, The Plant Cell Online.

[81]  A. Boodhoo,et al.  Crystal structure of phenylalanine ammonia lyase: multiple helix dipoles implicated in catalysis. , 2004, Biochemistry.

[82]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[83]  P. Arruda,et al.  Collection for Tropical Crop Sugarcane Analysis and Functional Annotation of an Expressed Sequence Tag , 2006 .

[84]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[85]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[86]  Zemin Zhang,et al.  A profile hidden Markov model for signal peptides generated by HMMER , 2003, Bioinform..

[87]  L. Poppe,et al.  An active site homology model of phenylalanine ammonia-lyase from Petroselinum crispum. , 2002, European journal of biochemistry.

[88]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[89]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[90]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[91]  J. Glaszmann,et al.  Molecular investigation of the genetic base of sugarcane cultivars , 1999, Theoretical and Applied Genetics.

[92]  David C. Ison,et al.  Determination of basic chromosome numbers in the genus Saccharum by physical mapping of ribosomal RNA genes , 1998 .

[93]  R. Dixon,et al.  Reduced Lignin Content and Altered Lignin Composition in Transgenic Tobacco Down-Regulated in Expression of L-Phenylalanine Ammonia-Lyase or Cinnamate 4-Hydroxylase , 1997, Plant physiology.

[94]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[95]  F. Meinzer,et al.  Compartmentation of solutes and water in developing sugarcane stalk tissue. , 1990, Plant physiology.