Clustering and Assembling Large Transcriptome Datasets by EasyCluster2

EasyCluster is a well-established python software appropriately developed to produce reliable clusters by expressed sequence tags (EST) in order to infer and improve gene structures as well as discover potential alternative splicing events. In the present work we present EasyCluster2, a reimplementation of EasyCluster in Java programming language, able to manage genome scale transcriptome data produced by Roche 454 sequencers. EasyCluster2 has been developed to speed up the creation of gene-oriented clusters and facilitate downstream analyses as the assembly of full-length transcripts. In addition, EasyCluster2 can employ known annotations to refine the overall clustering procedure, embeds the AStalavista software to predict the impact of alternative splicing per cluster and provides output files in specific formats to be uploaded in the UCSC genome browser for an easy browsing of results. Thanks to the user-friendly interface, EasyCluster2 simplifies the interpretation of findings to researchers with no specific skills in bioinformatics. Easycluster2 executable is freely available at https://code.google.com/p/easycluster2/.