Extracting Keywords from Short Government Documents Using Reinforcement Learning

In this paper, we proposed a novel approach to extract keywords from massive amount of unlabelled short government documents using reinforcement learning. To guide policy network to keep important words, we introduced the average rate regularization, as the sparsity constraints of the model’s loss function. Analysis on the results shows that the proposed model outperforms the traditional unsupervised keyword extraction approaches on massive amount of unlabelled government document headlines.