On Optimal Caching and Model Multiplexing for Large Model Inference