Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information