Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search

We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids.

[1]  N. Bhardwaj,et al.  Kernel-based machine learning protocol for predicting DNA-binding proteins , 2005, Nucleic acids research.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Jeffrey Skolnick,et al.  Efficient prediction of nucleic acid binding function from low-resolution protein structures. , 2006, Journal of molecular biology.

[4]  D. Ohlendorf,et al.  Electrostatics and flexibility in protein-DNA interactions. , 1985, Advances in biophysics.

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  H. Margalit,et al.  Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. , 1995, Journal of molecular biology.

[7]  Peter A. Flach,et al.  An extended transformation approach to inductive logic programming , 2001, ACM Trans. Comput. Log..

[8]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[9]  Akinori Sarai,et al.  Moment-based prediction of DNA-binding proteins. , 2004, Journal of molecular biology.

[10]  Kengo Kinoshita,et al.  Structure‐based prediction of DNA‐binding sites on proteins Using the empirical preference of electrostatic potential and the shape of molecular surfaces , 2004, Proteins.

[11]  R. Sauer,et al.  Transcription factors: structural families and principles of DNA recognition. , 1992, Annual review of biochemistry.

[12]  Guy Nimrod,et al.  Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. , 2009, Journal of molecular biology.

[13]  Qing Zhang,et al.  The Molecular Biology Toolkit (MBT): a modular platform for developing molecular visualization applications , 2005, BMC Bioinformatics.

[14]  Ondrej Kuzelka,et al.  Prediction of DNA-Binding Propensity of Proteins by the Ball-Histogram Method , 2011, ISBRA.

[15]  Toni Cathomen,et al.  Zinc-finger Nucleases: The Next Generation Emerges. , 2008, Molecular therapy : the journal of the American Society of Gene Therapy.

[16]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[17]  H M Berman,et al.  Protein-DNA interactions: A structural analysis. , 1999, Journal of molecular biology.

[18]  Saraswathi Vishveshwara,et al.  Insights into Protein–DNA Interactions through Structure Network Analysis , 2008, PLoS Comput. Biol..

[19]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[20]  May,et al.  [Wiley Series in Probability and Statistics] Applied Survival Analysis (Regression Modeling of Time-to-Event Data) || Extensions of the Proportional Hazards Model , 2008 .