A temporal attention based appearance model for video object segmentation