In this work, we develop a reinforcement learning (RL) based model-free approach to obtain a policy for joint packet scheduling and rate adaptation, such that the packet drop probability (PDP) is minimized. The developed learning scheme yields an online cross-layer scheduling policy which takes into account the randomness in packet arrivals and wireless channels, as well as the state of packet buffers. Inherent difference in the time-scales of packet arrival process and the wireless channel variations leads to sparsity in the observed reward signal. Since an RL agent learns by using the feedback obtained in terms of rewards for its actions, the sample complexity of RL approach increases exponentially due to resulting sparsity. Therefore, a basic RL based approach, e.g., double deep Q-network (DDQN) based RL, results in a policy with negligible performance gain over the state-of-the-art schemes, such as shortest processing time (SPT) based scheduling. In order to alleviate the sparse reward problem, we leverage prioritized experience replay (PER) and develop a DDQN-based learning scheme with PER. We observe through simulations that the policy learned using DDQN-PER approach results in a 3-5% lower PDP, compared to both the basic DDQN based RL and SPT scheme.