Transformer Based Unsupervised Pre-Training for Acoustic Representation Learning