A Unified Approach for Text- and Image-guided 4D Scene Generation