Identifying Binding Sites in Sequential Genomic Data

The identification of cis-regulatory binding sites in DNA is a difficult problem in computational biology. To obtain a full understanding of the complex machinery embodied in genetic regulatory networks it is necessary to know both the identity of the regulatory transcription factors together with the location of their binding sites in the genome. We show that using an SVM together with data sampling, to integrate the results of individual algorithms specialised for the prediction of binding site locations, can produce significant improvements upon the original algorithms. These results make more tractable the expensive experimental procedure of actually verifying the predictions.

[1]  Eric H Davidson,et al.  New computational approaches for analysis of cis-regulatory networks. , 2002, Developmental biology.

[2]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[3]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[4]  Massimo Vergassola,et al.  Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo , 2002, BMC Bioinformatics.

[5]  Mathieu Blanchette,et al.  FootPrinter: a program designed for phylogenetic footprinting , 2003, Nucleic Acids Res..

[6]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[9]  Walter R. Gilks,et al.  Transcription Binding Site Prediction Using Markov Models , 2006, J. Bioinform. Comput. Biol..

[10]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[11]  Stefano Lonardi,et al.  Efficient Detection of Unusual Words , 2000, J. Comput. Biol..

[12]  Neil Davey,et al.  Improving Computational Predictions of Cis- Regulatory Binding Sites , 2006, Pacific Symposium on Biocomputing.

[13]  Nitesh V. Chawla,et al.  Classification and knowledge discovery in protein databases , 2004, J. Biomed. Informatics.

[14]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[15]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[16]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.