High-throughput Generative Inference of Large Language Models with a Single GPU