Can 3D Vision-Language Models Truly Understand Natural Language?