Kernel-based identification of regulatory modules.

The challenge of identifying cis-regulatory modules (CRMs) is an important milestone for the ultimate goal of understanding transcriptional regulation in eukaryotic cells. It has been approached, among others, by motif-finding algorithms that identify overrepresented motifs in regulatory sequences. These methods succeed in finding single, well-conserved motifs, but fail to identify combinations of degenerate binding sites, like the ones often found in CRMs. We have developed a method that combines the abilities of existing motif finding with the discriminative power of a machine learning technique to model the regulation of genes (Schultheiss et al. (2009) Bioinformatics 25, 2126-2133). Our software is called KIRMES: , which stands for kernel-based identification of regulatory modules in eukaryotic sequences. Starting from a set of genes thought to be co-regulated, KIRMES: can identify the key CRMs responsible for this behavior and can be used to determine for any other gene not included on that list if it is also regulated by the same mechanism. Such gene sets can be derived from microarrays, chromatin immunoprecipitation experiments combined with next-generation sequencing or promoter/whole genome microarrays. The use of an established machine learning method makes the approach fast to use and robust with respect to noise. By providing easily understood visualizations for the results returned, they become interpretable and serve as a starting point for further analysis. Even for complex regulatory relationships, KIRMES: can be a helpful tool in directing the design of biological experiments.

[1]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[2]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[3]  P. M. Das,et al.  Chromatin immunoprecipitation assay. , 2004, BioTechniques.

[4]  William Stafford Noble,et al.  Support vector machine , 2013 .

[5]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[6]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[7]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[8]  M. L. Howard,et al.  cis-Regulatory control circuits in development. , 2004, Developmental biology.

[9]  Jun S. Liu,et al.  De novo cis-regulatory module elicitation for eukaryotic genomes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Alexander J. Hartemink,et al.  A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery , 2008, RECOMB.

[11]  A. Barski,et al.  Genomic location analysis by ChIP‐Seq , 2009, Journal of cellular biochemistry.

[12]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[13]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[14]  P. Walker,et al.  Evolution of motif variants and positional bias of the cyclic-AMP response element , 2007, BMC Evolutionary Biology.

[15]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[16]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[17]  Gunnar Rätsch,et al.  RASE: recognition of alternatively spliced exons in C.elegans , 2005, ISMB.

[18]  Gunnar Rätsch,et al.  KIRMES: kernel-based identification of regulatory modules in euchromatic sequences , 2009, Bioinformatics.

[19]  Gunnar Rätsch,et al.  POIMs: positional oligomer importance matrices—understanding support vector machine-based signal detectors , 2008, ISMB.

[20]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .