Joint Learning of LSTMs-CNN and Prototype for Micro-video Venue Classification

Generally, venue category information of the micro-video is an important cue in social network applications, such as location-oriented applications and personalized services. In the existing micro-video venue classification methods, the discrimination becomes worse due to unsuitable convolutional filter and convolutional padding, and the robustness is not enough that is caused by the softmax layer. In order to alleviate such problems, we propose a novel learning framework which jointly learns LSTMs-CNN and Prototype for micro-video venue classification. Specifically, LSTMs-CNN with convolutional padding of the SAME type and small convolutional filter is used to extract spatio-temporal information. The Prototype is simultaneously learned to improve the robustness against softmax classification function. We adopt Euclidean distance loss function to train the whole network. Extensive experimental results on a real-world dataset show that our model significantly outperforms the state-of-the-art baselines in terms of both Micro-F and Macro-F scores.