Expressive Talking Head Generation with Granular Audio-Visual Control