In Defense of Grid Features for Visual Question Answering