Finite-horizon optimal control of discrete-time linear systems with completely unknown dynamics using Q-learning