Co-Learning Multimodality PET-CT Features via a Cascaded CNN-Transformer Network