Learning When to Simplify Sentences for Natural Text Simplification

This paper introduces a corpus-based approach for selecting sentences that require simplification in the context of Brazilian Portuguese text simplification system. Based on a parallel corpus of original and simplified text versions, we apply a binary classifier to decide in which circumstances a sentence should or not be split – which is the most important syntactic simplification operation – so that the resulting simplified text is natural and not over simplified. Our classifier reaches 73.5% precision and 73.4% recall when selecting the sentences to be split or kept together.