Docking of flexible molecules using multiscale ligand representations.

Structural genomics will yield an immense number of protein three-dimensional structures in the near future. Automated theoretical methodologies are needed to exploit this information and are likely to play a pivotal role in drug discovery. Here, we present a fully automated, efficient docking methodology that does not require any a priori knowledge about the location of the binding site or function of the protein. The method relies on a multiscale concept where we deal with a hierarchy of models generated for the potential ligand. The models are created using the k-means clustering algorithm. The method was tested on seven protein-ligand complexes. In the largest complex, human immunodeficiency virus reverse transcriptase/nevirapin, the root mean square deviation value when comparing our results to the crystal structure was 0.29 A. We demonstrate on an additional 25 protein-ligand complexes that the methodology may be applicable to high throughput docking. This work reveals three striking results. First, a ligand can be docked using a very small number of feature points. Second, when using a multiscale concept, the number of conformers that require to be generated can be significantly reduced. Third, fully flexible ligands can be treated as a small set of rigid k-means clusters.