Dear Editor, Forwardgenetic screens are commonly usedas unbiased tools to isolate genes responsible for a phenotype of interest. In Arabidopsis thaliana, especially T-DNA activation tagging populations are frequently employed. These populations are generated using vectors containingmultiple copies of the constitutive 35Spromoters derived fromcauliflowermosaic virus (35S CaMV) and often result in isolation of dominant gain-of-function alleles (Weigel et al., 2000; Nakazawa et al., 2003). This allows the study ofmembers of large gene families that are often functionally redundant and, therefore, hard to identify in loss-offunction screens. Moreover, due to the dominant nature, the phenotypes can usually be recognized in T1 generations (Ostergaard and Yanofsky, 2004). Plasmid rescue and thermal asymmetric interlaced PCR (TAIL-PCR) have been effectively employed to recover plant-specific sequences flanking the T-DNA insertions (Weigel et al., 2000; Singer and Burke, 2003). However, in some instances, these two techniques do not yield expected results, probably due to potential sequence complexities following integration events (Laufs et al., 1999). Today, Illumina sequencing is the most widely applied nextgeneration sequencing technology. It has been used to identify, for example, transposon insertions in Zeamays (Williams-Carrier et al., 2010), to obtain large transcript sequences in Sesamum indicum (Wei et al., 2011) and for sequencing of chloroplast genomes (Cronn et al., 2008). Here, we present how Illumina paired-end sequencing (sequences of both the beginning and end of a randomly generated DNA fragment) can be employed to identify T-DNA loci in activation-taggedArabidopsis plants. To identify molecular components involved in controlling hyponastic growth in Arabidopsis thaliana (stress-induced upward leaf movement; reviewed in Van Zanten et al., 2010), we conducted a forward genetic screen on a population of plants tagged with a tetramer of 35S CaMV promoters (Weigel et al., 2000). Plants were screened for altered petiole angles at the start of the experiment (initial angle), after 6 h of ethylene exposure and after 6 h in low light intensity conditions. A set of candidate lines with aberrant petiole angle phenotypes was isolated and designated SDS2, SDS4, DDD1, and EDD1 (Figure 1A and 1B; for designation code and all experimental procedures, see ‘Supplemental Methods’). We confirmed that each line harbored only one T-DNA insertion by segregation analysis (Supplemental Figure 1). To identify T-DNA insertion loci in selected candidates, we first employed plasmid rescue and TAIL-PCR. Both methods repeatedly failed for these lines and, therefore, we adopted a novel approach using Illumina next-generation sequencing. Genomic DNA of the four lines was pooled and subjected to sequencing. 50-bp paired reads with 204 6 63 bases insert size were obtained with a total of 20 419 624 reads. The data generated by the Genome Analyzer IIx was a set of two files; one file contained the forward reads (+) and the second file contained the reverse reads (–). The files were in a proprietary format called SCARF (Solexa compact ASCII read format). First, the files were converted to the standard/Sanger FASTQ format. Subsequently, the forward and reverse reads were aligned separately to the reference sequence of the T-DNA (pSKI015; Weigel et al., 2000) and to the Arabidopsis genome, using the Bowtie aligner (Langmead et al., 2009). For each sequence, the position on the chromosome and the orientation were added. The aligned reads were then paired based on their ID using R programming scripts (RDevelopment Core Team.). As shown in Figure 1C, three types of pairing can occur: Arabidopsis/Arabidopsis, T-DNA/T-DNA, and Arabidopsis/T-DNA. The detection of T-DNA insertion loci is possible based on the latter pairing (Arabidopsis/T-DNA). Unaligned reads were also of interest, as some reads could map over the breakpoint and give its exact location. In order todetect such sequences, unaligned readswere blasted against reference genomes and sub-sequences of the same read that mapped both to Arabidopsis and T-DNAwere selected and analyzed. The pipeline used for searching the Arabidopsis/T-DNA paired reads detected three major breakpoints,
[1]
J. Traas,et al.
A chromosomal paracentric inversion associated with T-DNA integration in Arabidopsis.
,
1999,
The Plant journal : for cell and molecular biology.
[2]
R. Dixon,et al.
Activation tagging in Arabidopsis.
,
2000,
Plant physiology.
[3]
M. Matsui,et al.
Activation tagging, a novel tool to dissect the functions of a gene family.
,
2003,
The Plant journal : for cell and molecular biology.
[4]
T. Singer,et al.
High-throughput TAIL-PCR as a tool to identify DNA flanking insertions.
,
2003,
Methods in molecular biology.
[5]
M. Yanofsky,et al.
Establishing gene function by mutagenesis in Arabidopsis thaliana.
,
2004,
The Plant journal : for cell and molecular biology.
[6]
Cole Trapnell,et al.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
,
2009,
Genome Biology.
[7]
T. Mockler,et al.
Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology
,
2008,
Nucleic acids research.
[8]
Nicholas Stiffler,et al.
Use of Illumina sequencing to identify transposon insertions underlying mutant phenotypes in high-copy Mutator lines of maize.
,
2010,
The Plant journal : for cell and molecular biology.
[9]
J. Janssen,et al.
On the Relevance and Control of Leaf Angle
,
2010
.
[10]
Linhai Wang,et al.
Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers
,
2011,
BMC Genomics.