Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification