Less is more: A rule-based syntactic simplification module for improved text-to-pictograph translation

Abstract In order to enable or facilitate online communication for people with an intellectual disability, the Text-to-Pictograph translation system automatically translates Dutch written text into a series of Sclera or Beta pictographs. The baseline system presents the reader with a more or less verbatim pictograph-per-word translation. As a result, long and complex input sentences lead to long and complex pictograph translations, leaving the end users confused and distracted. To overcome these problems, we developed a rule-based simplification system for Dutch Text-to-Pictograph translation. By using recursion and applying the simplification operations in a logical way, only one syntactic parse is needed per message. Promising results are obtained.

[1]  Daniel Ferrés,et al.  YATS: Yet Another Text Simplifier , 2016, NLDB.

[2]  Vincent Vandeghinste,et al.  A Hybrid Modular Machine Translation System , 2008 .

[3]  Advaith Siddharthan,et al.  A survey of research on text simplification , 2014 .

[4]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[5]  Claudia Ross,et al.  Modern Mandarin Chinese Grammar: A Practical Guide , 2006 .

[6]  M. Taboada,et al.  Discourse relations and evaluation , 2016 .

[7]  Evie McCrum-Gardner,et al.  Which is the correct statistical test to use? , 2008, The British journal of oral & maxillofacial surgery.

[8]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[9]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[10]  Vincent Vandeghinste,et al.  Using a Parallel Transcript/Subtitle Corpus for Sentence Compression , 2004, LREC.

[11]  András Kornai,et al.  HunPos: an open source trigram tagger , 2007, ACL 2007.

[12]  Walter Daelemans,et al.  Automatic Sentence Simplification for Subtitling in Dutch and English , 2004, LREC.

[13]  Linda Kerr,et al.  Make it Simple , 1998 .

[14]  Frank Van Eynde,et al.  Large Scale Syntactic Annotation of Written Dutch: Lassy , 2013, Essential Speech and Language Technology for Dutch.

[15]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[16]  Liesbeth Augustinus,et al.  Complement Raising and Cluster Formation in Dutch , 2015 .

[17]  Jaime Medina,et al.  Towards Integrating People with Intellectual Disabilities in the Digital World , 2016, Intelligent Environments.

[18]  Ineke Schuurman,et al.  Linking Pictographs to Synsets: Sclera2Cornetto , 2014, LREC.

[19]  Ruslan Mitkov,et al.  Intelligent Text Processing to Help Readers with Autism , 2018 .

[20]  Yu Shiwen Automatic evaluation of output quality for Machine Translation systems , 1993 .

[21]  Ann Irvine Statistical Machine Translation in Low Resource Settings , 2013, HLT-NAACL.

[22]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.

[23]  Lucia Specia,et al.  Supporting the Adaptation of Texts for Poor Literacy Readers: a Text Simplification Editor for Brazilian Portuguese , 2009, BEA@NAACL.

[24]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[25]  Horacio Saggion,et al.  Text Simplification Tools for Spanish , 2012, LREC.

[26]  Yi Pan,et al.  Sentence Compression for Automated Subtitling: A Hybrid Approach , 2004, ACL 2004.

[27]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[28]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[29]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[30]  Tom Vanallemeersch,et al.  Automated Spelling Correction for Dutch Internet Users with Intellectual Disabilities , 2016 .

[31]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .