Optimal learning control for discrete-time nonlinear systems using generalized policy iteration based adaptive dynamic programming