Predicting Algorithm Classes for Programming Word Problems

We introduce the task of algorithm class prediction for programming word problems. A programming word problem is a problem written in natural language, which can be solved using an algorithm or a program. We define classes of various programming word problems which correspond to the class of algorithms required to solve the problem. We present four new datasets for this task, two multiclass datasets with 550 and 1159 problems each and two multilabel datasets having 3737 and 3960 problems each. We pose the problem as a text classification problem and train neural network and non-neural network-based models on this task. Our best performing classifier gets an accuracy of 62.7 percent for the multiclass case on the five class classification dataset, Codeforces Multiclass-5 (CFMC5). We also do some human-level analysis and compare human performance with that of our text classification models. Our best classifier has an accuracy only 9 percent lower than that of a human on this task. To the best of our knowledge, these are the first reported results on such a task. We make our code and datasets publicly available.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Oren Etzioni,et al.  Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[3]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[4]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[5]  R. Suresh,et al.  Classification and Recommendation of Competitive Programming Problems Using CNN , 2017 .

[6]  Sumit Gulwani,et al.  Program Synthesis , 2017, Software Systems Safety.

[7]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[8]  Michal Forisek,et al.  The Difficulty of Programming Contests Increases , 2009, ISSEP.

[9]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[10]  Luke S. Zettlemoyer,et al.  Learning to Automatically Solve Algebra Word Problems , 2014, ACL.

[11]  Hirokazu Anai,et al.  Semantic Parsing of Pre-university Math Problems , 2017, ACL.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[14]  Yiming Yang,et al.  Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[15]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[18]  Ronan Le Bras,et al.  Beyond Sentential Semantic Parsing: Tackling the Math SAT with a Cascade of Tree Transducers , 2017, EMNLP.

[19]  Xi Victoria Lin Program Synthesis from Natural Language Using Recurrent Neural Networks , 2017 .

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[24]  Shuming Shi,et al.  Deep Neural Solver for Math Word Problems , 2017, EMNLP.

[25]  Gayle Laakmann McDowell Cracking the Coding Interview: 189 Programming Questions and Solutions , 2015 .

[26]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[27]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[28]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Dipti Misra Sharma,et al.  Deep Neural Network based system for solving Arithmetic Word problems , 2017, IJCNLP.

[31]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[32]  Thomas Gertin,et al.  Maximizing the Cost of Shortest Paths between Facilities through Optimal Product Category Locations , 2012 .

[33]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.