Illumina sequencing technology as a method of identifying T-DNA insertion loci in activation-tagged Arabidopsis thaliana plants.

Dear Editor, Forwardgenetic screens are commonly usedas unbiased tools to isolate genes responsible for a phenotype of interest. In Arabidopsis thaliana, especially T-DNA activation tagging populations are frequently employed. These populations are generated using vectors containingmultiple copies of the constitutive 35Spromoters derived fromcauliflowermosaic virus (35S CaMV) and often result in isolation of dominant gain-of-function alleles (Weigel et al., 2000; Nakazawa et al., 2003). This allows the study ofmembers of large gene families that are often functionally redundant and, therefore, hard to identify in loss-offunction screens. Moreover, due to the dominant nature, the phenotypes can usually be recognized in T1 generations (Ostergaard and Yanofsky, 2004). Plasmid rescue and thermal asymmetric interlaced PCR (TAIL-PCR) have been effectively employed to recover plant-specific sequences flanking the T-DNA insertions (Weigel et al., 2000; Singer and Burke, 2003). However, in some instances, these two techniques do not yield expected results, probably due to potential sequence complexities following integration events (Laufs et al., 1999). Today, Illumina sequencing is the most widely applied nextgeneration sequencing technology. It has been used to identify, for example, transposon insertions in Zeamays (Williams-Carrier et al., 2010), to obtain large transcript sequences in Sesamum indicum (Wei et al., 2011) and for sequencing of chloroplast genomes (Cronn et al., 2008). Here, we present how Illumina paired-end sequencing (sequences of both the beginning and end of a randomly generated DNA fragment) can be employed to identify T-DNA loci in activation-taggedArabidopsis plants. To identify molecular components involved in controlling hyponastic growth in Arabidopsis thaliana (stress-induced upward leaf movement; reviewed in Van Zanten et al., 2010), we conducted a forward genetic screen on a population of plants tagged with a tetramer of 35S CaMV promoters (Weigel et al., 2000). Plants were screened for altered petiole angles at the start of the experiment (initial angle), after 6 h of ethylene exposure and after 6 h in low light intensity conditions. A set of candidate lines with aberrant petiole angle phenotypes was isolated and designated SDS2, SDS4, DDD1, and EDD1 (Figure 1A and 1B; for designation code and all experimental procedures, see ‘Supplemental Methods’). We confirmed that each line harbored only one T-DNA insertion by segregation analysis (Supplemental Figure 1). To identify T-DNA insertion loci in selected candidates, we first employed plasmid rescue and TAIL-PCR. Both methods repeatedly failed for these lines and, therefore, we adopted a novel approach using Illumina next-generation sequencing. Genomic DNA of the four lines was pooled and subjected to sequencing. 50-bp paired reads with 204 6 63 bases insert size were obtained with a total of 20 419 624 reads. The data generated by the Genome Analyzer IIx was a set of two files; one file contained the forward reads (+) and the second file contained the reverse reads (–). The files were in a proprietary format called SCARF (Solexa compact ASCII read format). First, the files were converted to the standard/Sanger FASTQ format. Subsequently, the forward and reverse reads were aligned separately to the reference sequence of the T-DNA (pSKI015; Weigel et al., 2000) and to the Arabidopsis genome, using the Bowtie aligner (Langmead et al., 2009). For each sequence, the position on the chromosome and the orientation were added. The aligned reads were then paired based on their ID using R programming scripts (RDevelopment Core Team.). As shown in Figure 1C, three types of pairing can occur: Arabidopsis/Arabidopsis, T-DNA/T-DNA, and Arabidopsis/T-DNA. The detection of T-DNA insertion loci is possible based on the latter pairing (Arabidopsis/T-DNA). Unaligned reads were also of interest, as some reads could map over the breakpoint and give its exact location. In order todetect such sequences, unaligned readswere blasted against reference genomes and sub-sequences of the same read that mapped both to Arabidopsis and T-DNAwere selected and analyzed. The pipeline used for searching the Arabidopsis/T-DNA paired reads detected three major breakpoints,