Multiattentive Recurrent Neural Network Architecture for Multilingual Readability Assessment

We present a multiattentive recurrent neural network architecture for automatic multilingual readability assessment. This architecture considers raw words as its main input, but internally captures text structure and informs its word attention process using other syntax- and morphology-related datapoints, known to be of great importance to readability. This is achieved by a multiattentive strategy that allows the neural network to focus on specific parts of a text for predicting its reading level. We conducted an exhaustive evaluation using data sets targeting multiple languages and prediction task types, to compare the proposed model with traditional, state-of-the-art, and other neural network strategies.

[1]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[3]  Goran Glavas,et al.  Simplifying Lexical Simplification: Do We Need Simplified Corpora? , 2015, ACL.

[4]  Arantza Díaz de Ilarraza,et al.  Simple or Complex? Assessing the readability of Basque Texts , 2014, COLING.

[5]  Mark Steedman,et al.  Assessing Relative Sentence Complexity using an Incremental CCG Parser , 2016, NAACL.

[6]  Tapas Kanungo,et al.  Predicting the readability of short web summaries , 2009, WSDM '09.

[7]  Michael Strube,et al.  Graph-based Coherence Modeling For Assessing Readability , 2015, *SEMEVAL.

[8]  Ed Zintel,et al.  Resources , 1998, IT Prof..

[9]  Maria Soledad Pera,et al.  Looking for the Movie Seven or Sven from the Movie Frozen?: A Multi-perspective Strategy for Recommending Queries for Children , 2018, CHIIR.

[10]  Nikolay Karpov,et al.  Single-Sentence Readability Prediction in Russian , 2014, AIST.

[11]  Hend Suliman Al-Khalifa,et al.  Towards the development of an automatic readability measurements for arabic language , 2008, 2008 Third International Conference on Digital Information Management.

[12]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[13]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[14]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[15]  Robert N. Kantor,et al.  On the Failure of Readability Formulas to Define Readable Texts: A Case Study from Adaptations. , 1982 .

[16]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[17]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[18]  Yaw-Huei Chen,et al.  Chinese readability assessment using TF-IDF and SVM , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[19]  Seth Spaulding,et al.  A Spanish Readability Formula , 1956 .

[20]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[21]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[22]  Lucia Specia,et al.  Unsupervised Lexical Simplification for Non-Native Speakers , 2016, AAAI.

[23]  I. Fajardo,et al.  Simplifying informational text structure for struggling readers , 2017, Reading and Writing.

[24]  Michael Strube,et al.  A Neural Local Coherence Model for Text Quality Assessment , 2018, EMNLP.

[25]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[26]  Véronique Hoste,et al.  All Mixed Up? Finding the Optimal Feature Set for General Readability Prediction and Its Application to English and Dutch , 2016, Computational Linguistics.

[27]  M. B. Muñoz legibilidad de los documentos informativos en español dirigidos a lesionados medulares y accesibles por internet Readability and internet accessibility of informative documents for spinal cord injury patients in Spanish , 2015 .

[28]  Andrew Elfenbein Research in Text and the Uses of Coh-Metrix , 2011 .

[29]  Stephen Grossberg,et al.  Nonlinear neural networks: Principles, mechanisms, and architectures , 1988, Neural Networks.

[30]  Lijun Feng,et al.  A Comparison of Features for Automatic Readability Assessment , 2010, COLING.

[31]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[32]  Maria Soledad Pera,et al.  A readability level prediction tool for K‐12 books , 2016, J. Assoc. Inf. Sci. Technol..

[33]  K. Robert Lai,et al.  Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model , 2016, ACL.

[34]  Simonetta Montemagni,et al.  READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification , 2011, SLPAT.

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  W. H. Douma De leesbaarheid van landbouwbladen : een onderzoek naar en een toepassing van leesbaarheidsformules , 1960 .

[37]  Arthur C. Graesser,et al.  Coh-Metrix , 2011 .

[38]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[39]  Elmer V. Bernstam,et al.  Instruments to assess the quality of health information on the World Wide Web: what can our patients actually use? , 2005, Int. J. Medical Informatics.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Rebekah George Benjamin Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty , 2012 .

[42]  Maria Soledad Pera,et al.  Automating readers' advisory to make book recommendations for K-12 readers , 2014, RecSys '14.

[43]  P. Fitzsimmons,et al.  A readability assessment of online Parkinson's disease information. , 2010, The journal of the Royal College of Physicians of Edinburgh.

[44]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[45]  Michael Strube,et al.  Lexical Coherence Graph Modeling Using Word Embeddings , 2016, NAACL.

[46]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.