论文信息 - Normalization of RNA-Seq

Normalization of RNA-Seq

Usually, an RNA-Seq data analysis “from scratch” starts with a set of FASTQ files (see e.g. http://en.wikipedia.org/wiki/FASTQ_format) which contain information on both the quality and the sequence of the short reads. There are several tools to align the reads to the reference genome (e.g. Bowtie, TopHat, GSNAP, Stampy, . . . ). A common output file format is the SAM/BAM format (of which you can read here: http://samtools.sourceforge.net/). You just saw how to align reads when you don’t have a genome, and how to summarize them. When you do have a genome, a standard approach is to align the reads with Bowtie or TopHat, and then summarize them in “region of interests”, such as gene, exons, non-coding RNAs, etc. To do this, you need your aligned reads and an annotation for your reference genome. There are tools and packages to summarize the aligned reads in gene counts. One of them is HTSeq (http://www-huber.embl.de/users/anders/ HTSeq/doc/overview.html) . The simple command:

Davide Risso | D. Risso

[1] D. Risso. EDASeq : Exploratory Data Analysis and Normalization for RNA-Seq , 2013 .

[2] A. Oshlack,et al. Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[3] Sandrine Dudoit,et al. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[4] Sandrine Dudoit,et al. GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[5] Cole Trapnell,et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[6] K. Hansen,et al. Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[7] Mark D. Robinson,et al. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[8] W. Huber,et al. Differential expression analysis for sequence count data , 2010 .