TISRover: ConvNets learn biologically relevant features for effective translation initiation site prediction

Being a key component in gene regulation, translation initiation is a well-studied topic. However, recent findings have shown translation initiation to be more complex than initially thought, urging for more effective prediction methods. In this paper, we present TISRover, a multi-layered convolutional neural network architecture for translation initiation site prediction. We achieve state-of-the-art results, outperforming a previous deep learning approach by 4% to 23% in terms of auPRC, and other approaches by at least 68% in terms of error rate. Furthermore, we present a methodology to analyse the decision-making process of our network models, revealing various biologically relevant features for translation initiation site prediction that are automatically learnt from scratch, without any prior knowledge. The most notable features found are the Kozak consensus sequence, the reading frame characteristics, the influence of stop and start codons in the sequence, and the presence of donor splice site patterns.