Sounding Video Generator: A Unified Framework for Text-Guided Sounding Video Generation