CAST: Concurrent Recognition and Segmentation with Adaptive Segment Tokens