A tale of two sequences : Interpretable and linguistically-informed deep learning for natural language processing
暂无分享,去创建一个
Deep Learning has swiftly taken over our field of NLP. It caused a shift from exploiting linguistic features and structures to relying solely on the input words. As performance records in NLP benchmarks keep being broken, we can ask ourselves: are linguistic structures now obsolete? In the first part of this thesis, we try to answer that question in the context of machine translation. We find that we can exploit a graph-convolutional network to condition an MT model on linguistic input structures, and that it brings performance improvements. We also investigate if we can induce structure in an MT setting, and find that it is possible to learn useful structure on top of embeddings and CNN representations, while obtaining trivial structure on top of LSTMs. In the second part of the thesis, we look at two common criticisms of neural networks: their lack of interpretability, and their hunger for labeled data to generalize well. We first study text classifiers, and make them interpretable by having them provide a rationale for their predictions. This is done by showing which part of the input text is used for classification. We show that our method is more aligned with human rationales than previous work. Finally, we investigate generalization of neural networks. We look at the SCAN benchmark and find obtaining a high score does not have to imply strong generalization behavior, due to the simple nature of the data set. We propose a remedy in the form of the NACS data set.