Can Language Models Learn to Listen?