A new learning control approach suitable for problems with finite action space