PEACE: Parallel Environment for Assembly and Clustering of Gene Expression

We present PEACE, a stand-alone tool for high-throughput ab initio clustering of transcript fragment sequences produced by Next Generation or Sanger Sequencing technologies. It is freely available from www.peace-tools.org. Installed and managed through a downloadable user-friendly graphical user interface (GUI), PEACE can process large data sets of transcript fragments of length 50 bases or greater, grouping the fragments by gene associations with a sensitivity comparable to leading clustering tools. Once clustered, the user can employ the GUI's analysis functions, facilitating the easy collection of statistics and allowing them to single out specific clusters for more comprehensive study or assembly. Using a novel minimum spanning tree-based clustering method, PEACE is the equal of leading tools in the literature, with an interface making it accessible to any user. It produces results of quality virtually identical to those of the WCD tool when applied to Sanger sequences, significantly improved results over WCD and TGICL when applied to the products of Next Generation Sequencing Technology and significantly improved results over Cap3 in both cases. In short, PEACE provides an intuitive GUI and a feature-rich, parallel clustering engine that proves to be a valuable addition to the leading cDNA clustering tools.

[1]  W Brad Barbazuk,et al.  Gene discovery and annotation using LCM-454 transcriptome sequencing. , 2006, Genome research.

[2]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[3]  Ernesto Picardi,et al.  EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data , 2009, BMC Bioinformatics.

[4]  Zsuzsanna Lipták,et al.  An overview of the wcd EST clustering tool , 2008, Bioinform..

[5]  Inge Jonassen,et al.  Fast Sequence Clustering Using A Suffix Array Algorithm , 2003, Bioinform..

[6]  R. Prim Shortest connection networks and some generalizations , 1957 .

[7]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[8]  B. Haas,et al.  Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology , 2006, BMC Genomics.

[9]  Guohui Lin,et al.  Ubiquitous Reassortments in influenza a viruses , 2008, J. Bioinform. Comput. Biol..

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Mark L. Blaxter,et al.  Making sense of EST sequences by CLOBBing them , 2002, BMC Bioinformatics.

[12]  Winston Hide,et al.  CLU: A new algorithm for EST clustering , 2005, BMC Bioinformatics.

[13]  Winston Hide,et al.  Biological Evaluation of d2, an Algorithm for High-Performance Sequence Comparison , 1994, J. Comput. Biol..

[14]  Ji-Ping Z. Wang,et al.  EST clustering error evaluation and correction , 2004, Bioinform..

[15]  John Quackenbush,et al.  TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets , 2003, Bioinform..

[16]  Srinivas Aluru,et al.  Efficient clustering of large EST data sets on parallel computers. , 2003, Nucleic acids research.

[17]  Scott Hazelhurst,et al.  ESTSim : A tool for creating benchmarks for EST clustering algorithms , 2003 .

[18]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[19]  D. Davison,et al.  d2_cluster: a validated method for clustering EST and full-length cDNAsequences. , 1999, Genome research.