Using Randomised Vectors in Transcription Factor Binding Site Predictions

Finding the location of binding sites in DNA is a difficult problem. Although the location of some binding sites have been experimentally identified, other parts of the genome may or may not contain binding sites. This poses problems with negative data in a trainable classifier. Here we show that using randomized negative data gives a large boost in classifier performance when compared to the original labeled data.

[1]  Mathieu Blanchette,et al.  FootPrinter: a program designed for phylogenetic footprinting , 2003, Nucleic Acids Res..

[2]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[3]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[4]  Neil Davey,et al.  Improving Computational Predictions of Cis- Regulatory Binding Sites , 2006, Pacific Symposium on Biocomputing.

[5]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[6]  Yi Sun,et al.  Using real-valued meta classifiers to integrate binding site predictions , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[7]  Neil Davey,et al.  Integrating Binding Site Predictions Using Non-linear Classification Methods , 2004, Deterministic and Statistical Methods in Machine Learning.

[8]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .