Dynamic memory network with spatial-temporal feature fusion for visual tracking