Hierarchical Transformers Are More Efficient Language Models