Towards an automatic requirements classification in a new Spanish dataset

Machine Learning (ML) algorithms have become a powerful instrument in software requirements classification. Nevertheless, most of the research focusing on requirements is in English, with less attention to other languages. Given a lack of datasets in Spanish, we created a new dataset from a collection of requirements from final degree projects from the University of A Coruña. In this paper, we investigate which combinations of text vectorization techniques with ML algorithms perform best for requirements classification in a Spanish dataset. We found that SVM with TF-IDF gives the highest f1-score (0.95 and 0.79 for functional and non-functional classification).

[1]  Jorge P'erez,et al.  Spanish Pre-trained BERT Model and Evaluation Data , 2023, ArXiv.

[2]  J. M. Pérez-Verdejo,et al.  A Systematic Literature Review on Machine Learning for Automated Requirements Classification , 2020, 2020 8th International Conference in Software Engineering Research and Innovation (CONISOFT).

[3]  J. Cherrie,et al.  Machine Learning and Deep Learning , 2019, International Journal of Innovative Technology and Exploring Engineering.

[4]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5]  Mukesh A. Zaveri,et al.  AUTOMATIC TEXT CLASSIFICATION: A TECHNICAL REVIEW , 2011 .

[6]  Manal Binkhonain,et al.  A review of machine learning algorithms for identification and classification of non-functional requirements , 2019, Expert Syst. Appl. X.