ANDY: a general, fault-tolerant tool for database searching on computer clusters

SUMMARY ANDY (seArch coordination aND analYsis) is a set of Perl programs and modules for distributing large biological database searches, and in general any sequence of commands, across the nodes of a Linux computer cluster. ANDY is compatible with several commonly used distributed resource management (DRM) systems, and it can be easily extended to new DRMs. A distinctive feature of ANDY is the choice of either dedicated or fair-use operation: ANDY is almost as efficient as single-purpose tools that require a dedicated cluster, but it runs on a general-purpose cluster along with any other jobs scheduled by a DRM. Other features include communication through named pipes for performance, flexible customizable routines for error-checking and summarizing results, and multiple fault-tolerance mechanisms. AVAILABILITY ANDY is freely available and can be obtained from http://compbio.berkeley.edu/proj/andy. SUPPLEMENTARY INFORMATION Supplemental data, figures, and a more detailed overview of the software are found at http://compbio.berkeley.edu/proj/andy.

[1]  Robert D. Bjornson,et al.  TurboBLAST : a parallel implementation of blast built on the turbohub , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[2]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[3]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[4]  Denis C. Shields,et al.  Wrapping up BLAST and other applications for use on Unix clusters , 2003, Bioinform..

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Raphaël Clifford,et al.  Disperse: a simple and efficient approach to parallel database searching , 2000, Bioinform..

[7]  Mark Gerstein,et al.  The protein target list of the Northeast Structural Genomics Consortium , 2004, Proteins.