Knowing What it is: Semantic-Enhanced Dual Attention Transformer