Developing and Implementing Artificial Intelligence-Based Classifier for Requirements Engineering

In nuclear power plant (NPP) projects, requirements engineering manages the sheer volume of requirements, typically characterized by descriptive and nonharmonized requirements. Large projects may have tens of thousands to hundreds of thousands of requirements to be managed and fulfilled. Two main issues impede requirements analysis: tortuous requirements to be interpreted; and humans' very limited ability to concentrate on a specific task. It has therefore been recognized that artificial intelligence (AI) algorithms have the potential to support designers' decision making in classifying and allocating NPP requirements into predefined classes. This paper presents our work on developing an AI-based requirements classifier utilizing natural language processing (NLP) and supervised machine-learning (ML). In addition, the paper presents the integration of the classifier with the requirements management system. The focus is on the classification of nuclear power industry-specific requirements utilizing deep-learning-based NLP. Three classifiers are compared, and the corresponding results are presented. The results include predetermined requirement classes, manually gathered and classified data, a comparison of three models and their classification accuracies, microservice system architecture, and integration of the established classifier with the requirements management system. As the performance of the requirements classifier and related system has been successfully demonstrated, future AI-specific development and studies are suggested to focus on atomizing multiclass requirements, combining similar requirements into one, checking requirements syntax, and utilizing unsupervised learning for clustering. Furthermore, new and advantageous requirement classes and hierarchies are suggested for development while improving current datasets both quantitatively and qualitatively.

[1]  Tapio Salakoski,et al.  Multilingual is not enough: BERT for Finnish , 2019, ArXiv.

[2]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[3]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[4]  Patrick Saint-Dizier Mining incoherent requirements in technical specifications: Analysis and implementation , 2018, Data Knowl. Eng..

[5]  Charles Elkan,et al.  F1-Optimal Thresholding in the Multi-Label Setting , 2014, ArXiv.

[6]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Sang-Bum Kim,et al.  Effective Methods for Improving Naive Bayes Text Classifiers , 2002, PRICAI.

[9]  Karol Grzegorczyk,et al.  Vector representations of text data in deep learning , 2019, ArXiv.

[10]  Mandalay Grems,et al.  Standards , 1987, CACM.

[11]  Claus Pahl,et al.  Microservices: The Journey So Far and Challenges Ahead , 2018, IEEE Softw..

[12]  Youngjoong Ko,et al.  Automatic Text Categorization by Unsupervised Learning , 2000, COLING.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Tetsuo Tamai,et al.  Quality Requirements Analysis with Machine Learning , 2018, ENASE.

[16]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[17]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[18]  W Moore Safety in standards. , 1993, Health visitor.

[19]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[20]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[21]  Songbo Tan,et al.  An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[22]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[23]  Robert E. Machol,et al.  System engineering handbook , 1965 .

[24]  Taeho Jo,et al.  Automatic text categorization using NTC , 2009, 2009 First International Conference on Networked Digital Technologies.

[25]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26]  R. Sathya,et al.  Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification , 2013 .

[27]  Muh-Cherng Wu,et al.  An effective application of decision tree to stock trading , 2006, Expert Syst. Appl..

[28]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[29]  Donald E. Brown,et al.  Text Classification Algorithms: A Survey , 2019, Inf..

[30]  E. Cambria,et al.  Deep Learning--based Text Classification , 2020, ACM Comput. Surv..

[31]  Minoru Nakayama,et al.  Subject Categorization for Web Educational Resources using MLP , 2003, ESANN.

[32]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[33]  Cheng Hua Li,et al.  An efficient document classification model using an improved back propagation neural network and singular value decomposition , 2009, Expert Syst. Appl..

[34]  Junaid Qadir,et al.  Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges , 2017, IEEE Access.

[35]  Cheng Hua Li,et al.  Text Categorization Based on Artificial Neural Networks , 2006, ICONIP.

[36]  Carlo Strapparava,et al.  Investigating Unsupervised Learning for Text Categorization Bootstrapping , 2005, HLT/EMNLP.

[37]  nasa,et al.  NASA Systems Engineering Handbook , 2007 .

[38]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[39]  Neil A Bradbury Attention span during lectures: 8 seconds, 10 minutes, or more? , 2016, Advances in physiology education.

[40]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[41]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[42]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[43]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[44]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[45]  S. Lamba,et al.  Impact of Teaching Time on Attention and Concentration , 2014 .

[46]  Benjamin Lecouteux,et al.  FlauBERT: Unsupervised Language Model Pre-training for French , 2020, LREC.

[47]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[48]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[49]  Santeri Myllynen Utilization of artificial intelligence in the analysis of nuclear power plant requirements , 2019 .