Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers

Numeracy is the ability to understand and work with numbers. It is a necessary skill for composing and understanding documents in clinical, scientific, and other technical domains. In this paper, we explore different strategies for modelling numerals with language models, such as memorisation and digit-by-digit composition, and propose a novel neural architecture that uses a continuous probability density function to model numerals from an open vocabulary. Our evaluation on clinical and scientific datasets shows that using hierarchical models to distinguish numerals from words improves a perplexity metric on the subset of numerals by 2 and 4 orders of magnitude, respectively, over non-hierarchical models. A combination of strategies can further improve perplexity. Our continuous probability density function model reduces mean absolute percentage errors by 18% and 54% in comparison to the second best strategy for each dataset, respectively.

[1]  Ganesh Ramakrishnan,et al.  Numerical Relation Extraction with Minimal Supervision , 2016, AAAI.

[2]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[3]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[4]  Yotaro Watanabe,et al.  Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web , 2013, ACL.

[5]  Ari Rappoport,et al.  Extraction and Approximation of Numerical Attributes from the Web , 2010, ACL.

[6]  Oren Etzioni,et al.  Solving Geometry Problems: Combining Text and Diagram Interpretation , 2015, EMNLP.

[7]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[8]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[9]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[10]  Dana Ganor-Stern,et al.  Primitives and Non-primitives of Numerical Representations , 2015 .

[11]  Samyam Rajbhandari,et al.  LONG SHORT-TERM MEMORY , 2018 .

[12]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[13]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[14]  Christopher Potts,et al.  Learning the meaning of scalar adjectives , 2010 .

[15]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[16]  Lukasz Kaiser,et al.  Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[17]  Dan Roth,et al.  Reasoning about Quantities in Natural Language , 2015, TACL.

[18]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[19]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.

[20]  Shuming Shi,et al.  Automatically Solving Number Word Problems by Semantic Parsing and Reasoning , 2015, EMNLP.

[21]  Lauren B. Resnick,et al.  Numeracy as Cultural Practice: An Examination of Numbers in Magazines for Children, Teenagers, and Adults , 2012 .

[22]  Kazuhiko Ohe,et al.  UTH: SVM-based Semantic Relation Classification using Physical Sizes , 2007, SemEval@ACL.

[23]  Ming-Wei Chang,et al.  Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems , 2016, EACL.

[24]  Ming-Wei Chang,et al.  Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems , 2016, EMNLP.

[25]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[26]  Wenlin Chen,et al.  Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.

[27]  Joerg P. Ueberla Analysing a simple language model·some general conclusions for language models for speech recognition , 1994, Comput. Speech Lang..

[28]  Alessandro Moschitti,et al.  End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories , 2011, ACL.

[29]  Oren Etzioni,et al.  Parsing Algebraic Word Problems into Equations , 2015, TACL.

[30]  Luke S. Zettlemoyer,et al.  Learning to Automatically Solve Algebra Word Problems , 2014, ACL.

[31]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[32]  Yoshua Bengio,et al.  On integrating a language model into neural machine translation , 2017, Comput. Speech Lang..

[33]  Christopher D. Manning,et al.  Deep Neural Language Models for Machine Translation , 2015, CoNLL.

[34]  John Louis Emil Dreyer A New General Catalogue of Nebulæ and Clusters of Stars, being the Catalogue of the late Sir John F. W. Herschel, Bart, revised, corrected, and enlarged , 1888 .

[35]  Oren Etzioni,et al.  Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[36]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[37]  Tara N. Sainath,et al.  A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.

[38]  Helen Yannakoudakis,et al.  Auxiliary Objectives for Neural Error Detection Models , 2017, BEA@EMNLP.

[39]  Bruce R. Miller,et al.  Transforming Large Collections of Scientific Publications to XML , 2010, Math. Comput. Sci..

[40]  Mirella Lapata,et al.  Language Models Based on Semantic Composition , 2009, EMNLP.

[41]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[42]  Dan Roth,et al.  Solving General Arithmetic Word Problems , 2016, EMNLP.

[43]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.