Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement

Upland cotton is a model for polyploid crop domestication and transgenic improvement. Here we sequenced the allotetraploid Gossypium hirsutum L. acc. TM-1 genome by integrating whole-genome shotgun reads, bacterial artificial chromosome (BAC)-end sequences and genotype-by-sequencing genetic maps. We assembled and annotated 32,032 A-subgenome genes and 34,402 D-subgenome genes. Structural rearrangements, gene loss, disrupted genes and sequence divergence were more common in the A subgenome than in the D subgenome, suggesting asymmetric evolution. However, no genome-wide expression dominance was found between the subgenomes. Genomic signatures of selection and domestication are associated with positively selected genes (PSGs) for fiber improvement in the A subgenome and for stress tolerance in the D subgenome. This draft genome sequence provides a resource for engineering superior cotton lines.

Lei Fang | Ruiqiang Li | Daniel G Peterson | Don C. Jones | Xiaoyang Xu | Tianzhen Zhang | Yue Tian | Baoliang Zhou | Brian E Scheffler | Zhi Jiang | Yan Hu | Mengqiao Pan | Ruiqiang Li | Lei Fang | Wenxue Ye | Q. Song | Tianzhen Zhang | Z. Chen | Caiping Cai | Wangzhen Guo | D. Peterson | Zhi Jiang | D. Llewellyn | E. Dennis | Xiaoyang Xu | Jinbo Zhang | Qun Wan | Lei Zhou | D. Stelly | B. Scheffler | Wenkai Jiang | Qiyang Zuo | Yan Hu | Xueying Guan | Jiedan Chen | C. Saski | Amanda M. Hulse-Kemp | Bingliang Liu | Chunxiao Liu | Sen Wang | Mengqiao Pan | Yangkun Wang | Dawei Wang | Lijing Chang | Wenpan Zhang | Ryan C. Kirkbride | Xiaoya Chen | P. Thaxton | Qiong Wang | Hua Zhang | Huaitong Wu | Gaofu Mei | Shuqi Chen | Yue Tian | Dan Xiang | Xinghe Li | Jian Ding | Li Tao | Yunchao Liu | Ji Li | Yu Lin | Yuanyuan Hui | Zhisheng Cao | Xiefei Zhu | Baoliang Zhou | Danny J Llewellyn | Christopher A Saski | Qingxin Song | David M Stelly | Z Jeffrey Chen | Don C Jones | Jinbo Zhang | Yu Lin | Dawei Wang | Yuanyuan Hui | Wenkai Jiang | Zhisheng Cao | Xueying Guan | Jiedan Chen | Amanda M Hulse-Kemp | Qun Wan | Bingliang Liu | Chunxiao Liu | Sen Wang | Yangkun Wang | Wenxue Ye | Lijing Chang | Wenpan Zhang | Ryan C Kirkbride | Xiaoya Chen | Elizabeth Dennis | Peggy Thaxton | Qiong Wang | Hua Zhang | Huaitong Wu | Lei Zhou | Gaofu Mei | Shuqi Chen | Dan Xiang | Xinghe Li | Jian Ding | Qiyang Zuo | Linna Tao | Yunchao Liu | Ji Li | Caiping Cai | Xiefei Zhu | Wangzhen Guo | Wen-pan Zhang | Q. Song | Ryan Kirkbride

[1]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[3]  Xun Xu,et al.  Genome sequence of the cultivated cotton Gossypium arboreum , 2014, Nature Genetics.

[4]  G. Bonnema,et al.  Biased Gene Fractionation and Dominant Gene Expression among the Subgenomes of Brassica rapa , 2012, PloS one.

[5]  James C. Schnable,et al.  Following Tetraploidy in Maize, a Short Deletion Mechanism Removed Genes Preferentially from One of the Two Homeologs , 2010, PLoS biology.

[6]  L. Peshkin,et al.  Genome sequencing reveals insights into physiology and longevity of the naked mole rat , 2011, Nature.

[7]  John Z. Yu,et al.  Toward Sequencing Cotton (Gossypium) Genomes , 2007, Plant Physiology.

[8]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[9]  Manoj Prasad,et al.  NAC proteins: regulation and role in stress tolerance. , 2012, Trends in plant science.

[10]  Lindsey J. Leach,et al.  Patterns of homoeologous gene expression shown by RNA sequencing in hexaploid bread wheat , 2014, BMC Genomics.

[11]  Q. Song,et al.  Polyploidy and small RNA regulation of cotton fiber development. , 2014, Trends in plant science.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Lex E. Flagel,et al.  Homoeolog expression bias and expression level dominance in allopolyploids. , 2012, The New phytologist.

[14]  J. Pires,et al.  Genomic Changes in Resynthesized Brassica napus and Their Effect on Gene Expression and Phenotype[W][OA] , 2007, The Plant Cell Online.

[15]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[16]  R. Nielsen,et al.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. , 2005, Molecular biology and evolution.

[17]  R W Doerge,et al.  Genomewide Nonadditive Gene Regulation in Arabidopsis Allotetraploids , 2006, Genetics.

[18]  Jonathan F. Wendel,et al.  Ecology, Evolution and Organismal Biology Publications Ecology, Evolution and Organismal Biology Comparative Evolutionary and Developmental Dynamics of the Cotton (gossypium Hirsutum) Fiber Transcriptome Comparative Evolutionary and Developmental Dynamics of the Cotton (gossypium Hirsutum) Fiber Tra , 2022 .

[19]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  J. Wendel New World tetraploid cottons contain Old World cytoplasm. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[22]  David M. Goodstein,et al.  Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..

[23]  D. Llewellyn,et al.  GhMYB25-like: a key factor in early cotton fibre development. , 2011, The Plant journal : for cell and molecular biology.

[24]  C. V. Feaster,et al.  Semigametic Production of Haploids in Pima Cotton 1 , 1969 .

[25]  S. Otto,et al.  The Evolutionary Consequences of Polyploidy , 2007, Cell.

[26]  A. Paterson,et al.  Rate variation among nuclear genes and the age of polyploidy in Gossypium. , 2003, Molecular biology and evolution.

[27]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[28]  G. Segal,et al.  Rapid elimination of low-copy DNA sequences in polyploid wheat: a possible mechanism for differentiation of homoeologous chromosomes. , 1997, Genetics.

[29]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[30]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[31]  D. Soltis,et al.  Rapid, Repeated, and Clustered Loss of Duplicate Genes in Allopolyploid Plant Populations of Independent Origin , 2012, Current Biology.

[32]  Yuxian Zhu,et al.  How cotton fibers elongate: a tale of linear cell-growth mode. , 2011, Current opinion in plant biology.

[33]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[34]  Corinne Da Silva,et al.  Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome , 2014, Science.

[35]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[36]  Andrew G. Sharpe,et al.  The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure , 2014, Nature Communications.

[37]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[38]  R. Voorrips MapChart: software for the graphical presentation of linkage maps and QTLs. , 2002, The Journal of heredity.

[39]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[40]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[41]  Jia-Wei Wang,et al.  Control of Plant Trichome Development by a Cotton Fiber MYB Genew⃞ , 2004, The Plant Cell Online.

[42]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[43]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[44]  He Zhang,et al.  Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution , 2015, Nature Biotechnology.

[45]  C. Haigler,et al.  Cotton fiber: a powerful single-cell model for cell wall and cellulose research , 2012, Front. Plant Sci..

[46]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[47]  Thomas Nussbaumer,et al.  MIPS PlantsDB: a database framework for comparative plant genome research , 2012, Nucleic Acids Res..

[48]  Localization of high level of sequence conservation and divergence regions in cotton , 2012, Theoretical and Applied Genetics.

[49]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[50]  M. Peitsch,et al.  The tobacco genome sequence and its comparison with those of tomato and potato , 2014, Nature Communications.

[51]  J. Batley,et al.  A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome , 2014, Science.

[52]  John Z. Yu,et al.  The draft genome of a diploid cotton Gossypium raimondii , 2012, Nature Genetics.

[53]  P. Kersey,et al.  Analysis of the bread wheat genome using whole genome shotgun sequencing , 2012, Nature.

[54]  J. Wendel,et al.  Polyploid formation in cotton is not accompanied by rapid genomic changes. , 2001, Genome.

[55]  S. Rhee,et al.  MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. , 2004, The Plant journal : for cell and molecular biology.

[56]  Koichiro Tamura,et al.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. , 2013, Molecular biology and evolution.

[57]  J. Salojärvi,et al.  Transcriptomics and Functional Genomics of ROS-Induced Cell Death Regulation by RADICAL-INDUCED CELL DEATH1 , 2014, PLoS genetics.

[58]  Adi Doron-Faigenboim,et al.  Ecology, Evolution and Organismal Biology Publications Ecology, Evolution and Organismal Biology Repeated Polyploidization of Gossypium Genomes and the Evolution of Spinnable Cotton Fibres , 2022 .

[59]  Roeland E. Voorrips,et al.  Software for the calculation of genetic linkage maps , 2001 .

[60]  E. Turcotte,et al.  Genetics, cytology and evolution of Gossypium. , 1985 .

[61]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[62]  Y. Ruan,et al.  Developmental and molecular physiological evidence for the role of phosphoenolpyruvate carboxylase in rapid cotton fibre elongation , 2009, Journal of experimental botany.

[63]  Roderic Guigó,et al.  Assembling Genes from Predicted Exons In Linear Time with Dynamic Programming , 1998, J. Comput. Biol..

[64]  Kun Lu,et al.  The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes , 2014, Nature Communications.

[65]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[66]  R. Kohel,et al.  Texas Marker-1. Description of a Genetic Standard for Gossypium hirsutum L. 1 , 1970 .

[67]  M. C. Baracat-Pereira,et al.  Arabidopsis and tobacco plants ectopically expressing the soybean antiquitin-like ALDH7 gene display enhanced tolerance to drought, salinity, and oxidative stress. , 2006, Journal of experimental botany.

[68]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[69]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[70]  J. Poulain,et al.  The genome of the mesopolyploid crop species Brassica rapa , 2011, Nature Genetics.

[71]  Ling-Jian Wang,et al.  Gene expression and metabolite profiles of cotton fiber during cell elongation and secondary cell wall synthesis , 2007, Cell Research.

[72]  Hongkun Zheng,et al.  Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. , 2006, Genomics.

[73]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[74]  Vipin T. Sreedharan,et al.  Multiple reference genomes and transcriptomes for Arabidopsis thaliana , 2011, Nature.

[75]  Tianzhen Zhang,et al.  Structure and size variations between 12A and 12D homoeologous chromosomes based on high-resolution cytogenetic map in allotetraploid cotton , 2010, Chromosoma.

[76]  Jonathan F. Wendel,et al.  Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Tianzhen Zhang,et al.  A preliminary analysis of genome structure and composition in Gossypium hirsutum , 2008, BMC Genomics.

[78]  Ge Gao,et al.  PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors , 2013, Nucleic Acids Res..

[79]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[80]  Laxmi Parida,et al.  The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color , 2013, Genome Biology.

[81]  R. Nielsen,et al.  Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. , 2002, Molecular biology and evolution.

[82]  Pamela S Soltis,et al.  The role of hybridization in plant speciation. , 2009, Annual review of plant biology.

[83]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[84]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.