Paraphrase generation based on lexical knowledge and features for a natural language question answering system

A question answering (QA) system constructs its answers automatically by querying a structured database known as a knowledgebase or an unstructured collection of documents and a set of questions. Paraphrase approaches are widely used to solve paraphrastic problems in natural language QA systems. In machine-learning-based Korean paraphrase, the system requires a large-scale mono/bi-lingual corpus. However, thus far, a well-structured corpus is lack, and it is difficult to get alignment data between Korean and English without noise for entailment. This paper creates paraphrase sentences using synonym knowledge and the various features of full morphemes. The results here demonstrate that the paraphrase quality can be improved by the following features: the morpheme type, the dependencies, and the semantic arguments. The feature of the semantic role labeling (SRL) results can be of assistance when attempting to solve instances of word sense disambiguation (WSD) for lexical replacement in Korean.