An Analysis of Tophat: A Fast Splice Junction Mapper for RNA-Sequencing

In order to boost the theatrical performance of a TUX-E-DO Pipeline, we concentrated on the programs (tools) running within a pipeline to optimize their processing time. Initially we figured out the programs executing in the central part of the tuxedo pipeline which consume time more critically. We processed multiple raw RNA-Seq datasets on a tuxedo pipeline and recorded the time consumed by each tool to achieve this task. Therefore, we identified tophat as the maximum time consuming program (tool). Anyhow, tophat is a fast and efficient spliced aligner, as aligning RNA-Seq reads to a reference genome comparatively it consumes more time than the other programs. To find the logic behind the lengthy processing of tophat we executed multiple independent raw RNA-Seq data-sets by tophat used different number of threads and the execution-time of a data-set is recorded. As we know that, increasing the number of threads reduces the processing time. Contrarily, the results show that the processing time increases with increasing the number of threads. After the analysis and comprehensive simulations of the data processing-time of all data-sets, we found that between the threads there is a lack of communication and synchronization. To increase number of threads requires increase resolution of communication and synchronization. There is an enormous increase in alignment time resulting in processing time elongation.