Complex Word Identification: Convolutional Neural Network vs. Feature Engineering

We describe the systems of NLP-CIC team that participated in the Complex Word Identification (CWI) 2018 shared task. The shared task aimed to benchmark approaches for identifying complex words in English and other languages from the perspective of non-native speakers. Our goal is to compare two approaches: feature engineering and a deep neural network. Both approaches achieved comparable performance on the English test set. We demonstrated the flexibility of the deep-learning approach by using the same deep neural network setup in the Spanish track. Our systems achieved competitive results: all our systems were within 0.01 of the system with the best macro-F1 score on the test sets except on Wikipedia test set, on which our best system is 0.04 below the best macro-F1 score.

[1]  Christian Biemann,et al.  Multilingual and Cross-Lingual Complex Word Identification , 2017, RANLP.

[2]  Lucia Specia,et al.  SemEval 2016 Task 11: Complex Word Identification , 2016, *SEMEVAL.

[3]  Michal Konkol,et al.  UWB at SemEval-2016 Task 11: Exploring Features for Complex Word Identification , 2016, *SEMEVAL.

[4]  Gillin Nat Sensible at SemEval-2016 Task 11: Neural Nonsense Mangled in Ensemble Mess , 2016, SemEval@NAACL-HLT.

[5]  Lucia Specia,et al.  Inferring Psycholinguistic Properties of Words , 2016, NAACL.

[6]  Lucia Specia,et al.  SV000gg at SemEval-2016 Task 11: Heavy Gauge Complex Word Identification with System Voting , 2016, SemEval@NAACL-HLT.

[7]  Wenpeng Yin,et al.  Convolutional Neural Network for Paraphrase Identification , 2015, NAACL.

[8]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[9]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Joachim Bingel,et al.  CoastalCPH at SemEval-2016 Task 11: The importance of designing your Neural Networks right , 2016, *SEMEVAL.

[14]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[15]  Wenpeng Yin,et al.  MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity , 2015, ACL.

[16]  Christian Biemann,et al.  CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups , 2017, IJCNLP.

[17]  Elnaz Davoodi,et al.  CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification , 2016, SemEval@NAACL-HLT.

[18]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[19]  Lucia Specia,et al.  A Report on the Complex Word Identification Shared Task 2018 , 2018, BEA@NAACL-HLT.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..