A graphical approach to tracking and reporting target status in structural genomics

Determination of a protein structure requires a series of decisions and processes, starting with target selection, through cloning, expression, purification, and finally structure determination. Structural genomics projects may distribute these steps among several different groups of researchers. Although this division may achieve a lower cost per solved structure, it creates a unique set of challenges for integrating and passing information on the progress of a given target across several functional divisions. Laboratory information management systems (LIMS) are essential for gathering this information, but may not display the progress of a given target in an intuitive way. In addition, structural genomics projects funded by the Protein Structure Initiative (PSI) are obliged to disseminate data regularly to the TargetDB and PepcDB data repositories, and this requires the creation of specialized views of the data. We report here how the flow of a target through a structural genomics pipeline and reports to TargetDB and PepcDB can be abstracted as directed acyclic graphs or trees. To implement this kind of display, we created software that tracks the flow of activity leading toward protein structure determination and prepares XML reports as input to TargetDB and PepcDB. The target tracing software consists of a set of Perl CGI scripts that integrate with the Graphviz visualization system to provide a graphical, user-friendly Web interface. The database reporting software, also coded in Perl, transfers large-scale genomics data from our LIMS into a PepcDB reportable XML file. This software package has facilitated inter-group communication, improved the quality and accuracy of information in our LIMS, and increased the efficiency and accuracy of our reports to PepcDB.