Automatic Construction of a Phonics Curriculum for Reading Education Using the Transformer Neural Network

Key to effective phonics instruction is the teaching of grapheme-phoneme (GP) correspondences in a systematic progression that starts with the most frequent and consistent pronunciation rules. However, discovering the relevant rules is not an easy task and usually requires subjective analysis by a native speaker and/or expert linguist. We describe GPA4.0, a submodule to the Transformer neural network model that automatizes the task of grapheme-to-phoneme (g2p) transcription and alignment. The network is trained with four different languages of decreasing orthographic transparency (Spanish < Portuguese < French < English). Our results show that the Transformer model improves on the current state-of-the-art in g2p transcription and that the attention mechanism allows for the alignment of graphemes to their corresponding phonemes. From the g2p aligned words, our software provides an optimally ordered phonics progression based on frequency and consistency in the target language, as well as an ordered list of words that teachers can use. This work exemplifies a practical way that neural networks can be used to develop educational materials for research and teachers. Submodules and phonics output are available at, https://github.com/OlivierDehaene/GPA4.0.

[1]  S. Dehaene,et al.  How Learning to Read Changes the Cortical Networks for Vision and Language , 2010, Science.

[2]  H. Lyytinen,et al.  Brain sensitivity to print emerges when children learn letter–speech sound correspondences , 2010, Proceedings of the National Academy of Sciences.

[3]  Morag Stuart,et al.  Children's printed word database: continuities and changes over time in children's early reading vocabulary. , 2010, British journal of psychology.

[4]  Karen Livescu,et al.  Jointly learning to align and convert graphemes to phonemes with neural attention models , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[5]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6]  Bernard Lété,et al.  MANULEX: A grade-level lexical database from French elementary school readers , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[7]  J. W. Cunningham,et al.  The National Reading Panel Report , 2001 .

[8]  K. Rastle,et al.  Ending the Reading Wars: Reading Acquisition From Novice to Expert , 2018, Psychological science in the public interest : a journal of the American Psychological Society.

[9]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10]  Samy Bengio,et al.  Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[11]  U. Goswami,et al.  Children's orthographic representations and linguistic transparency: Nonsense word reading in English, French, and Spanish , 1998, Applied Psycholinguistics.

[12]  Karin Landerl,et al.  Influences of orthographic consistency and reading instruction on the development of nonword reading skills , 2000 .

[13]  Heikki Lyytinen,et al.  This Reprint May Differ from the Original in Pagination and Typographic Detail. the Graphogame Method: the Theoretical and Methodological Background of the Technology-enhanced Learning Environment for Learning to Read the Graphogame Method: the Theoretical and Methodological Background of the Techno , 2022 .

[14]  P. H. Seymour,et al.  Foundation literacy acquisition in European orthographies. , 2003, British journal of psychology.

[15]  J. Alegria,et al.  Variations in reading and spelling acquisition in Portuguese, French and Spanish : A cross-linguistic comparison , 2011 .

[16]  J. Ziegler,et al.  Orthographic Depth and Its Impact on Universal Predictors of Reading , 2010, Psychological science.

[17]  Fuchun Peng,et al.  Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Silvia Corral,et al.  LEXIN: A lexical database from Spanish kindergarten and first-grade readers , 2009, Behavior research methods.