Fully automated pipeline for detection of sex linked genes using RNA-Seq data

BackgroundSex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring.ResultsWe present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed – reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy – an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design and subsequent steps of wet-lab verification.ConclusionsOur pipeline presents a simple and freely accessible software tool for identification of sex chromosome linked genes in species without an existing reference genome. Based on combination of genetic crosses and RNA-Seq data, we have designed a high-throughput, cost-effective approach for a broad community of scientists focused on sex chromosome structure and evolution.

[1]  D. Filatov,et al.  Plant Y Chromosome Degeneration Is Retarded by Haploid Purifying Selection , 2011, Current Biology.

[2]  S. Wright,et al.  Genetic degeneration of old and young Y chromosomes in the flowering plant Rumex hastatulus , 2014, Proceedings of the National Academy of Sciences.

[3]  R. Hobza,et al.  Gender in plants: sex chromosomes are emerging from the fog. , 2004, Trends in genetics : TIG.

[4]  Richard C. Moore,et al.  Genetic and functional analysis of DD44, a sex-linked gene from the dioecious plant Silene latifolia, provides clues to early events in sex chromosome evolution. , 2003, Genetics.

[5]  R. Hobza,et al.  Laser microdissection-based analysis of plant sex chromosomes. , 2007, Methods in cell biology.

[6]  Daniel J. Blankenberg,et al.  Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists , 2010, Current protocols in molecular biology.

[7]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[8]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[9]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[10]  M. Lexa,et al.  Contrasting Patterns of Transposable Element and Satellite Distribution on Sex Chromosomes (XY1Y2) in the Dioecious Plant Rumex acetosa , 2013, Genome biology and evolution.

[11]  C. Ainsworth Isolation of RNA from floral tissue ofRumex acetosa (Sorrel) , 1994, Plant Molecular Biology Reporter.

[12]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[15]  S. Mousset,et al.  Rapid De Novo Evolution of X Chromosome Dosage Compensation in Silene latifolia, a Plant with Young Sex Chromosomes , 2012, PLoS biology.

[16]  A. Joachimiak,et al.  Male gametophyte development and two different DNA classes of pollen grains in Rumex acetosa L., a plant with an XX/XY1Y2 sex chromosome system and a female-biased sex ratio , 2007, Sexual Plant Reproduction.

[17]  J. Doležel,et al.  MK17, a specific marker closely linked to the gynoecium suppression region on the Y chromosome in Silene latifolia , 2006, Theoretical and Applied Genetics.