Towards Natural Language-Driven Assembly Using Foundation Models