论文信息 - EMNLP@CPH: Is frequency all there is to simplicity?

EMNLP@CPH: Is frequency all there is to simplicity?

Our system breaks down the problem of ranking a list of lexical substitutions according to how simple they are in a given context into a series of pairwise comparisons between candidates. For this we learn a binary classifier. As only very little training data is provided, we describe a procedure for generating artificial unlabeled data from Wordnet and a corpus and approach the classification task as a semi-supervised machine learning problem. We use a co-training procedure that lets each classifier increase the other classifier's training set with selected instances from an unlabeled data set. Our features include n-gram probabilities of candidate and context in a web corpus, distributional differences of candidate in a corpus of "easy" sentences and a corpus of normal sentences, syntactic complexity of documents that are similar to the given context, candidate length, and letter-wise recognizability of candidate as measured by a trigram character language model.

Sigrid Klerke | Anders Søgaard | Anders Johannsen | Héctor Martínez Alonso

[1] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[2] Gerhard Reinelt,et al. The Linear Ordering Problem: Exact and Heuristic Methods in Combinatorial Optimization , 2011 .

[3] Linnea C. Ehri,et al. Learning to Read Words: Theory, Findings, and Issues , 2005 .

[4] R. P. Fishburne,et al. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[5] Lucia Specia,et al. SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[6] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[7] Magnus Sahlgren,et al. An Introduction to Random Indexing , 2005 .

[8] C. Bjornsson. Readability of Newspapers in 11 Languages. , 1983 .

[9] Xiaofei Lu,et al. Automatic analysis of syntactic complexity in second language writing , 2010 .