Learning from delayed rewards