Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition