Comparing Plausibility Estimates in Base and Instruction-Tuned Large Language Models