i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data