Improving Computational Predictions of Cis- Regulatory Binding Sites

The location of cis-regulatory binding sites determine the connectivity of genetic regulatory networks and therefore constitute a natural focal point for research into the many biological systems controlled by such regulatory networks. Accurate computational prediction of these binding sites would facilitate research into a multitude of key areas, including embryonic development, evolution, pharmacogenemics, cancer and many other transcriptional diseases, and is likely to be an important precursor for the reverse engineering of genome wide, genetic regulatory networks. Many algorithmic strategies have been developed for the computational prediction of cis-regulatory binding sites but currently all approaches are prone to high rates of false positive predictions, and many are highly dependent on additional information, limiting their usefulness as research tools. In this paper we present an approach for improving the accuracy of a selection of established prediction algorithms. Firstly, it is shown that species specific optimization of algorithmic parameters can, in some cases, significantly improve the accuracy of algorithmic predictions. Secondly, it is demonstrated that the use of non-linear classification algorithms to integrate predictions from multiple sources can result in more accurate predictions. Finally, it is shown that further improvements in prediction accuracy can be gained with the use of biologically inspired post-processing of predictions.

[1]  Mathieu Blanchette,et al.  FootPrinter: a program designed for phylogenetic footprinting , 2003, Nucleic Acids Res..

[2]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[3]  Nitesh V. Chawla,et al.  Classification and knowledge discovery in protein databases , 2004, J. Biomed. Informatics.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[6]  Neil Davey,et al.  Integrating Binding Site Predictions Using Non-linear Classification Methods , 2004, Deterministic and Statistical Methods in Machine Learning.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  Massimo Vergassola,et al.  Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo , 2002, BMC Bioinformatics.

[9]  Jean-Marc Delosme,et al.  Performance of a new annealing schedule , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[10]  Eric H Davidson,et al.  New computational approaches for analysis of cis-regulatory networks. , 2002, Developmental biology.

[11]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[12]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[13]  E. Davidson Genomic Regulatory Systems: Development and Evolution , 2005 .

[14]  Stefano Lonardi,et al.  Efficient Detection of Unusual Words , 2000, J. Comput. Biol..

[15]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.