论文信息 - Towards an automatic requirements classification in a new Spanish dataset

Towards an automatic requirements classification in a new Spanish dataset

Machine Learning (ML) algorithms have become a powerful instrument in software requirements classification. Nevertheless, most of the research focusing on requirements is in English, with less attention to other languages. Given a lack of datasets in Spanish, we created a new dataset from a collection of requirements from final degree projects from the University of A Coruña. In this paper, we investigate which combinations of text vectorization techniques with ML algorithms perform best for requirements classification in a Spanish dataset. We found that SVM with TF-IDF gives the highest f1-score (0.95 and 0.79 for functional and non-functional classification).

María Isabel Limaylla Lunarejo | Nelly Condori-Fernández | M. R. Luaces

[1] Jorge P'erez,et al. Spanish Pre-trained BERT Model and Evaluation Data , 2023, ArXiv.

[2] J. M. Pérez-Verdejo,et al. A Systematic Literature Review on Machine Learning for Automated Requirements Classification , 2020, 2020 8th International Conference in Software Engineering Research and Innovation (CONISOFT).

[3] J. Cherrie,et al. Machine Learning and Deep Learning , 2019, International Journal of Innovative Technology and Exploring Engineering.

[4] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5] Mukesh A. Zaveri,et al. AUTOMATIC TEXT CLASSIFICATION: A TECHNICAL REVIEW , 2011 .

[6] Manal Binkhonain,et al. A review of machine learning algorithms for identification and classification of non-functional requirements , 2019, Expert Syst. Appl. X.