What do we know?: simple statistical techniques that help.

An understanding of simple statistical techniques is invaluable in science and in life. Despite this, and despite the sophistication of many concerning the methods and algorithms of molecular modeling, statistical analysis is usually rare and often uncompelling. I present here some basic approaches that have proved useful in my own work, along with examples drawn from the field. In particular, the statistics of evaluations of virtual screening are carefully considered.

[1]  D. Mccloskey,et al.  The Cult of Statistical Significance , 2009 .

[2]  J. A. Grant,et al.  A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. , 2005, Journal of medicinal chemistry.

[3]  Satterthwaite Fe An approximate distribution of estimates of variance components. , 1946 .

[4]  G. Smith,et al.  Statistical Reasoning , 1985 .

[5]  M. G. Bulmer,et al.  Principles of Statistics. , 1969 .

[6]  S. Glantz Biostatistics: how to detect, correct and prevent errors in the medical literature. , 1980, Circulation.

[7]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[8]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[9]  Phillip I. Good,et al.  Common Errors in Statistics (and How to Avoid Them) , 2003 .

[10]  Jean-Michel Marin,et al.  Bayesian Core: A Practical Approach to Computational Bayesian Statistics , 2010 .

[11]  Richard B. Chambers,et al.  Primer of Biostatistics, 5th ed , 2002 .

[12]  J. A. Grant,et al.  Gaussian docking functions. , 2003, Biopolymers.

[13]  Ajay N. Jain Surflex-Dock 2.1: Robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search , 2007, J. Comput. Aided Mol. Des..

[14]  T. Loredo From Laplace to Supernova SN 1987A: Bayesian Inference in Astrophysics , 1990 .

[15]  E. S. Keeping,et al.  Introduction to statistical inference , 1958 .

[16]  Stephen M. Stigler Statistics and the Question of Standards , 1996, Journal of research of the National Institute of Standards and Technology.

[17]  Lemuel A. Moyé,et al.  Statistical Reasoning in Medicine , 2000 .

[18]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..

[19]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[20]  Mehryar Mohri,et al.  Confidence Intervals for the Area Under the ROC Curve , 2004, NIPS.

[21]  G. Belle Statistical rules of thumb , 2002 .

[22]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[23]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[24]  Istvan J. Enyedy,et al.  Can we use docking and scoring for hit-to-lead optimization? , 2008, J. Comput. Aided Mol. Des..

[25]  John Skilling,et al.  Data analysis : a Bayesian tutorial , 1996 .

[26]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[27]  Jerome H. Kim,et al.  Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand. , 2009, The New England journal of medicine.