Moving away from semantic overfitting in disambiguation datasets

Entities and events in the world have no frequency, but our communication about them and the expressions we use to refer to them do have a strong frequency profile. Language expressions and their meanings follow a Zipfian distribution, featuring a small amount of very frequent observations and a very long tail of low frequent observations. Since our NLP datasets sample texts but do not sample the world, they are no exception to Zipf’s law. This causes a lack of representativeness in our NLP tasks, leading to models that can capture the head phenomena in language, but fail when dealing with the long tail. We therefore propose a referential challenge for semantic NLP that reflects a higher degree of ambiguity and variance and captures a large range of small real-world phenomena. To perform well, systems would have to show deep understanding on the linguistic tail.

[1]  Rachel Rudinger,et al.  SenseSpotting: Never let your parallel data tie you to an old domain , 2013, ACL.

[2]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[3]  Jordan L. Boyd-Graber,et al.  Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers , 2015, NAACL.

[4]  Piek T. J. M. Vossen,et al.  DutchSemCor: in quest of the ideal sense-tagged corpus , 2013, RANLP.

[5]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[6]  Judita Preiss A detailed comparison of WSD systems: an analysis of the system answers for the SENSEVAL-2 English all words task , 2006, Nat. Lang. Eng..

[7]  Piek T. J. M. Vossen,et al.  Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution , 2014, LREC.

[8]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[9]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[10]  M-Dyaa Albakour,et al.  What do a Million News Articles Look like? , 2016, NewsIR@ECIR.

[11]  David Berthelot,et al.  WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia , 2016, ACL.

[12]  Piek T. J. M. Vossen,et al.  Semantic overfitting: what 'world' do we consider when evaluating disambiguation of text? , 2016, COLING.

[13]  Heiko Paulheim,et al.  Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job , 2016, LREC.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[16]  PreissJudita A detailed comparison of WSD systems: an analysis of the system answers for the SENSEVAL-2 English all words task , 2006 .

[17]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[18]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[19]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.