cisExpress: motif detection in DNA sequences

MOTIVATION One of the major challenges for contemporary bioinformatics is the analysis and accurate annotation of genomic datasets to enable extraction of useful information about the functional role of DNA sequences. This article describes a novel genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. This new tool, cisExpress, is especially designed for use with large datasets, such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node. We demonstrate the robust nature and validity of the proposed method. It is applicable for use with a wide range of genomic databases for any species of interest. AVAILABILITY cisExpress is available at www.cisexpress.org.

[1]  M. Matzke Faculty Opinions recommendation of Transcriptomic analysis reveals calcium regulation of specific promoter motifs in Arabidopsis. , 2011 .

[2]  Michael E. Wall,et al.  Galib: a c++ library of genetic algorithm components , 1996 .

[3]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[4]  Rongchen Wang,et al.  Microarray Analysis of the Nitrate Response in Arabidopsis Roots and Shoots Reveals over 1,000 Rapidly Responding Genes and New Linkages to Glucose, Trehalose-6-Phosphate, Iron, and Sulfate Metabolism1[w] , 2003, Plant Physiology.

[5]  N. Saunders,et al.  Transcriptomic Analysis Reveals Calcium Regulation of Specific Promoter Motifs in Arabidopsis[W] , 2011, Plant Cell.

[6]  Pascal von Koskull-Döring,et al.  The diversity of plant heat stress transcription factors. , 2007, Trends in plant science.

[7]  B. Tjaden,et al.  Assessing computational tools for the discovery of small RNA genes in bacteria. , 2011, RNA.

[8]  Stefan R. Henz,et al.  A gene expression map of Arabidopsis thaliana development , 2005, Nature Genetics.

[9]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[10]  E. Bornberg-Bauer,et al.  The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. , 2007, The Plant journal : for cell and molecular biology.

[11]  E. Grotewold,et al.  Genome wide analysis of Arabidopsis core promoters , 2005, BMC Genomics.

[12]  Nickolai N Alexandrov,et al.  Genome-wide discovery of cis-elements in promoter sequences using gene expression. , 2009, Omics : a journal of integrative biology.

[13]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[14]  N. Suzuki,et al.  Identification of the MBF1 heat-response regulon of Arabidopsis thaliana. , 2011, The Plant journal : for cell and molecular biology.

[15]  G. K. Sandve,et al.  A survey of motif discovery methods in an integrated framework , 2006, Biology Direct.

[16]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[17]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.