As underwater acoustic (UWA) channels usually experience temporally variation, link disconnection usually occurs during long time deployment of UWA networks. In the UWA data collection network, one destination needs to collect data from multiple underwater nodes. With the thought of node cooperation, one node can be selected as a potential relay to forward data for another failure node in the retransmission phase. One of the key points is that the selection schedule depends on the channel state information. Whereas, the channel usually varies during the information collection time which will make the decision schedule not accurate. In this paper, a Q-Learning based cooperation scheme has been proposed for node selection in time varying UWA channels, with the setup of proper states, action and rewards. The state is a combination of channel state information (CSI) and mutual information, and the rewards updating functions have been given. With the proposed method, the cooperative forwarding relay nodes can be chosen by the rewards which has been updated with channel variation information. Simulation results indicate that proposed Q-Learning based cooperative scheme can achieve better system capacity compared to random schemes. And with predicted CSI, the performance is close to the bench mark with ideal CSI.