Using Eye-tracking Data to Predict the Readability of Brazilian Portuguese Sentences in Single-task, Multi-task and Sequential Transfer Learning Approaches

Sentence complexity assessment is a relatively new task in Natural Language Processing. One of its aims is to highlight in a text which sentences are more complex to support the simplification of contents for a target audience (e.g., children, cognitively impaired users, non-native speakers and low-literacy readers (Scarton and Specia, 2018)). This task is evaluated using datasets of pairs of aligned sentences including the complex and simple version of the same sentence. For Brazilian Portuguese, the task was addressed by (Leal et al., 2018), who set up the first dataset to evaluate the task in this language, reaching 87.8% of accuracy with linguistic features. The present work advances these results, using models inspired by (Gonzalez-Garduño and Søgaard, 2018), which hold the state-of-the-art for the English language, with multi-task learning and eyetracking measures. First-Pass Duration, Total Regression Duration and Total Fixation Duration were used in two moments; first to select a subset of linguistic features and then as an auxiliary task in the multi-task and sequential learning models. The best model proposed here reaches the new state-of-the-art for Portuguese with 97.5% accuracy1, an increase of almost 10 points compared to the best previous results, in addition to proposing improvements in the public dataset after analysing the errors of our best model.

[1]  Hilario Inacio Bohn LINGUISTIC COMPLEXITY AND TEXT COMPREHENSION - Readability lssues Reconsidered by Davison and Green , 1990 .

[2]  Caroline Gasperin,et al.  SIMPLIFICA: a tool for authoring simplified texts in Brazilian Portuguese guided by readability assessments , 2010, NAACL.

[3]  C. Perfetti,et al.  Linguistic complexity and text comprehension : readability issues reconsidered , 1989 .

[4]  Walt Detmar Meurers,et al.  Assessing the relative reading level of sentence pairs for text simplification , 2014, EACL.

[5]  Vera Demberg,et al.  Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking , 2017, EACL.

[6]  Anders Søgaard,et al.  Learning to Predict Readability Using Eye-Movement Data From Natives and Learners , 2018, AAAI.

[7]  Giovanni Pilato,et al.  A Neural Network model for the Evaluation of Text Complexity in Italian Language: a Representation Point of View , 2018, BICA.

[8]  Sandra M. Aluísio,et al.  A Nontrivial Sentence Corpus for the Task of Sentence Readability Assessment in Portuguese , 2018, COLING.

[9]  Thomas Wolf,et al.  Transfer Learning in Natural Language Processing , 2019, NAACL.

[10]  William H. DuBay Smart Language: Readers, Readability, and the Grading of Text , 2007 .

[11]  Mark Steedman,et al.  Assessing Relative Sentence Complexity using an Incremental CCG Parser , 2016, NAACL.

[12]  Walt Detmar Meurers,et al.  Readability assessment for text simplification: From analysing documents to identifying sentential simplifications , 2014 .

[13]  Steven G. Luke,et al.  The Provo Corpus: A large eye-tracking corpus with predictability norms , 2018, Behavior research methods.

[14]  Lucia Specia,et al.  Building a Brazilian Portuguese Parallel Corpus of Original and Simplified Texts , 2009 .

[15]  Giosuè Lo Bosco,et al.  Deep Neural Attention-Based Model for the Evaluation of Italian Sentences Complexity , 2020, 2020 IEEE 14th International Conference on Semantic Computing (ICSC).

[16]  Lucia Specia,et al.  Learning Simplifications for Specific Target Audiences , 2018, ACL.

[17]  Samar Husain,et al.  Quantifying sentence complexity based on eye-tracking measures , 2016, CL4LC@COLING 2016.

[18]  Elena Lloret,et al.  Towards Adaptive Text Summarization: How Does Compression Rate Affect Summary Readability of L2 Texts? , 2019, RANLP.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  C. A. Weaver,et al.  Psychology of Reading , 2012 .

[21]  Sara Tonelli,et al.  MUSST: A Multilingual Syntactic Simplification Tool , 2017, IJCNLP.

[22]  Lucia Specia,et al.  Text Simplification from Professionally Produced Corpora , 2018, LREC.

[23]  Walt Detmar Meurers,et al.  Readability-based Sentence Ranking for Evaluating Text Simplification , 2016, ArXiv.

[24]  Sandra M. Aluísio,et al.  Evaluating Progression of Alzheimer's Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese , 2016, PROPOR.

[25]  Heiner Stuckenschmidt,et al.  Automatic Assessment of Absolute Sentence Complexity , 2017, IJCAI.

[26]  Simonetta Montemagni,et al.  READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification , 2011, SLPAT.

[27]  Felice Dell'Orletta,et al.  Assessing the Readability of Sentences: Which Corpora and Features? , 2014, BEA@ACL.

[28]  Kevyn Collins-Thompson,et al.  Computational Assessment of Text Readability: A Survey of Current and Future Research Running title: Computational Assessment of Text Readability , 2014 .

[29]  B. Lyxell,et al.  Looking at text simplification-Using eye tracking to evaluate the readability of automatically simplified sentences Linnea , 2018 .

[30]  Felice Dell'Orletta,et al.  Is this Sentence Difficult? Do you Agree? , 2018, EMNLP.

[31]  Sandra M. Aluísio,et al.  A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese , 2017, TSD.