Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation