Transformer-based Hebrew NLP models for Short Answer Scoring in Biology

Pre-trained large language models (PLMs) are adaptable to a wide range of downstream tasks by fine-tuning their rich contextual embeddings to the task, often without requiring much task-specific data. In this paper, we explore the use of a recently developed Hebrew PLM aleph-BERT for automated short answer grading of high school biology items. We show that the alephBERT-based system outperforms a strong CNN-based baseline, and that it general-izes unexpectedly well in a zero-shot paradigm to items on an unseen topic that address the same underlying biological concepts, opening up the possibility of automatically assessing new items without item-specific fine-tuning.

[1]  C. Seifert,et al.  Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers , 2022, ArXiv.

[2]  Giora Alexandron,et al.  Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology , 2022, International Journal of Artificial Intelligence in Education.

[3]  Micha Riser,et al.  Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers , 2022, International Journal of Artificial Intelligence in Education.

[4]  Omar Nael,et al.  AraScore: A deep learning-based system for Arabic short answer scoring , 2022, Array.

[5]  Daniel Matthew Cer,et al.  Language-agnostic BERT Sentence Embedding , 2020, ACL.

[6]  Pedro Ortiz Suarez,et al.  A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages , 2020, Annual Meeting of the Association for Computational Linguistics.

[7]  Aubrey Condor,et al.  Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating , 2020, AIED.

[8]  Swarnadeep Saha,et al.  Pre-Training BERT on Domain Resources for Short Answer Grading , 2019, EMNLP.

[9]  Tejas I. Dhamecha,et al.  Improving Short Answer Grading Using Transformer-Based Pre-training , 2019, AIED.

[10]  Reut Tsarfaty,et al.  AlephBERT: Language Model Pre-training and Evaluation from Sub-Word to Sentence Level , 2022, ACL.

[11]  Torsten Zesch,et al.  Similarity-Based Content Scoring - How to Make S-BERT Keep Up With BERT , 2022, BEA.

[12]  H. T. Nguyen,et al.  Fully Automated Short Answer Scoring of the Trial Tests for Common Entrance Examinations for Japanese University , 2022, AIED.

[13]  Rebecca J. Passonneau,et al.  A Semantic Feature-Wise Transformation Relation Network for Automatic Short Answer Grading , 2021, EMNLP.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Semire Dikli Automated Essay Scoring. , 2006 .