Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency