EST: Evaluating Scientific Thinking in Artificial Agents