Optimizing a Continuum Manipulator’s Search Policy Through Model-Free Reinforcement Learning