Do We Really Need Temporal Convolutions in Action Segmentation?