Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning

Abstract Natural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.

[1]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[2]  Zhe Zhang,et al.  Teaching Power Electronics With a Design-Oriented, Project-Based Learning Method at the Technical University of Denmark , 2016, IEEE Transactions on Education.

[3]  Srinivas Bangalore,et al.  Natural Language Generation in Interactive Systems , 2014 .

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[6]  Kai-Uwe Carstensen,et al.  Problem-based Web-based Teaching in a Computational Linguistics Curriculum , 2013 .

[7]  Fei Liu,et al.  Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization , 2018, EMNLP.

[8]  Tao Li,et al.  Document clustering via adaptive subspace iteration , 2004, SIGIR '04.

[9]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[10]  Robert J. Gaizauskas,et al.  A Hybrid Approach to Multi-document Summarization of Opinions in Reviews , 2014, INLG.

[11]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[12]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[13]  Ted Kwartler The OpenNLP Project , 2017 .

[14]  Aman Yadav,et al.  Problem‐based Learning: Influence on Students' Learning in an Electrical Engineering Course , 2011 .

[15]  M. Indiramma Project based learning — Theoretical foundation of computation course , 2014, 2014 International Conference on Interactive Collaborative Learning (ICL).

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Avishek Anand,et al.  ArchiveSpark: Efficient Web archive access, extraction and derivation , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[18]  D. Allen,et al.  The power of problem‐based learning in teaching introductory science courses , 1996 .

[19]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[20]  Judy Kay,et al.  Problem-Based Learning for Foundation Computer Science Courses , 2000, Comput. Sci. Educ..

[21]  Serife Ak,et al.  The Effects of Computer Supported Problem Based Learning on Students' Approaches to Learning , 2011 .

[22]  Mor Naaman,et al.  Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies , 2018, NAACL.

[23]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[24]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[27]  David Gijbels,et al.  Deep and surface learning in problem-based learning: a review of the literature , 2015, Advances in Health Sciences Education.

[28]  Mikko Honkala,et al.  Applying the Problem-Based Learning Approach to Teach Elementary Circuit Analysis , 2007, IEEE Transactions on Education.

[29]  Andrew McCallum,et al.  Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..

[30]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[31]  LuAnn Wilkerson,et al.  Problem‐based learning: One approach to increasing student participation , 1989 .

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  Edward A. Fox,et al.  Big Data Text Summarization for Events: A Problem Based Learning Course , 2015, JCDL.

[34]  Eric Mazur Qualitative vs. quantitative thinking: Are we teaching the right thing? , 1992 .

[35]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[36]  Lauri Malmi,et al.  PBL and Computer Programming — The Seven Steps Method with Adaptations , 2005, Comput. Sci. Educ..

[37]  D. Allen,et al.  What is Problem-Based Learning? , 2006 .

[38]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[39]  Jan Pomikálek Removing Boilerplate and Duplicate Content from Web Corpora , 2011 .

[40]  K. Engle Problem-Based Learning: An Approach to Medical Education , 1981 .

[41]  Julie E. Mills,et al.  Engineering Education, Is Problem-Based or Project-Based Learning the Answer , 2003 .

[42]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[43]  Robert Dale,et al.  Handbook of Natural Language Processing , 2001, Computational Linguistics.

[44]  T. D. Milster,et al.  Image Technology in Engineering Practice: An Example from Optical Testing. , 1992 .

[45]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[46]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[47]  Petr Sojka,et al.  Gensim -- Statistical Semantics in Python , 2011 .

[48]  Hichem Frigui,et al.  Simultaneous Clustering and Dynamic Keyword Weighting for Text Documents , 2004 .

[49]  Yen-Chun Chen,et al.  Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting , 2018, ACL.

[50]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[51]  Lin Padgham,et al.  Problem based learning with technological support in an AI subject: description and evaluation , 1997, ACSE '97.

[52]  Derek Greene,et al.  Practical solutions to the problem of diagonal dominance in kernel document clustering , 2006, ICML.

[53]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[54]  Patrice Lopez,et al.  GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications , 2009, ECDL.

[55]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[56]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[57]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[58]  J. Biggs What the student does: teaching for enhanced learning , 1999 .

[59]  H S Barrows,et al.  A taxonomy of problem‐based learning methods , 1986, Medical education.

[60]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[61]  D. T. Vernon,et al.  Attitudes and opinions of faculty tutors about problem‐based learning , 1995, Academic medicine : journal of the Association of American Medical Colleges.

[62]  Miguel Núñez-del-Prado,et al.  Learning data analytics through a Problem Based Learning course , 2017, 2017 IEEE World Engineering Education Conference (EDUNINE).

[63]  M. Cline,et al.  Problem based learning via open ended projects in Carnegie Mellon University's Chemical Engineering undergraduate laboratory , 1997, Proceedings Frontiers in Education 1997 27th Annual Conference. Teaching and Learning in an Era of Change.

[64]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.