Methods for high-throughput protein expression, purification and structure determination adapted for structural genomics

The Human Genome Project and other sequencing projects continue to identify thousands of genes encoding novel proteins with unknown functions. The functional annotation of these proteins, a field usually referred to as functional genomics, has therefore taken a centre stage in biological research. One approach to functional characterisation takes advantage of the dependence of the function of proteins on their three dimensional structure, and has been termed structural genomics (or structural proteomics) (1). While structural information cannot always provide complete and accurate functional assignment, recent analyses show that the structure gives some insight into protein function in 75% of cases (2). Integration of bioinformatic and experimental approaches promises further improvements in the area of functional assignment (3). The principal aim of the worldwide structural genomics initiative is to provide a complete repertoire of protein folds by determining a representative structure (using Xray crystallography or nuclear magnetic resonance) for each individual protein family (4). While individual consortia approach this goal in different ways and focus on different protein realms, the nature of the structural genomics initiative requires all to share the need for implementing high-throughput methodologies. The tactics include both the adoption of existing technologies in high-throughput mode, and the development of new techniques, at all the stages of the protein structure determination pipeline from gene cloning, protein expression, purification and structure determination to data management. When integrated, these methods could allow thousands of proteins to be fed through the pipeline in a highly automated fashion. In this paper, we briefly review recent methodology developments implemented by structural genomics initiatives, focusing in particular on the areas of recombinant protein expression in bacteria, purification and crystallization. Many of these methodologies are not restricted to high-throughput approaches, large-scale initiatives or structure determination pipelines, and are as applicable to smaller-scale projects and alternative approaches to the characterisation of biochemical and cellular functions of proteins. Where appropriate, we comment on our own experience in implementing some of the described methods to the structural and functional characterisation of proteins from macrophages (5,6).