Clustering high-throughput sequencing data based on patterns of co-expression

This vignette outlines two possibilities for clustering gene expression based on patterns of co-expression between samples, optionally accounting for the replicate structure of the data. The package can be installed as > source("http://www.bioconductor.org/biocLite.R") > biocLite("clusterSeq") > library(clusterSeq) We demonstrate these analyses on a set of time series data in female rat thymus tissues taken from the Rat Bodymap project [1]. By identifying clusters of genes that demonstrate similar patterns of expression over time we can identify patterns of time dependence within the data. We first load in the processed data of observed read counts at each gene for each sample. > data(ratThymus, package = "clusterSeq") > head(ratThymus) We define the replicate structure of the data in a vector whose members correspond to the columns of the data matrix. > replicates <c("2week","2week","2week","2week", + "6week","6week","6week","6week", + "21week","21week","21week","21week", + "104week","104week","104week","104week") Library scaling factors are acquired here using the baySeq::getLibsizes function but might be acquired through any other means. > library(baySeq) > libsizes <getLibsizes(data = ratThymus)