Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition