SVM Machine Learning Classifier to Automate the Extraction of SRS Elements

The process of extraction of software entities such as system, use case, and actor from an English natural language description of a user’s software requirements is a linguistic and semantic process of a natural language processing application. Entity extraction is known to be a complicated and challenging problem by researchers in the fields of linguistics or computation, due to the ambiguities in natural languages. This paper presents a named entity recognition method called SyAcUcNER (System Actor Use-Case Named Entity Recognizer), for extracting the system, actor, and use case entities from unstructured English descriptions of user requirements for the software. SyAcUcNER uses one of the Machine Learning (ML) approaches, that is, the Support Vector Machine (SVM) as an effective classifier. Also, SyAcUcNER uses a semantic role labeling process to tag the words in the text of user software requirements. SyAcUcNER is the first work that defines the structure of a requirements engineering specialized NER, the first work that uses a specialized NER model as an approach for extracting actor and use case entities from English language requirements description, and the first time an SVM has been used to specify the semantic meanings of words in a certain domain of discourse; that is the Software Requirements Specification (SRS). The performance of SyAcUcNER, which utilizes WEKA’s SVM, is evaluated using a binomial technique, and the results gained from running SyAcUcNER on text corpora from assorted sources give weighted averages of 76.2% for precision, 76% for recall, and 72.1% for the F-measure.

[1]  Stefan Holban,et al.  A genetic algorithm for classification , 2011 .

[2]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[3]  Philip Samuel,et al.  Software Requirement Elicitation Using Natural Language Processing , 2015, IBICA.

[4]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[5]  Anton Oleinik,et al.  What are neural networks not good at? On artificial creativity , 2019, Big Data Soc..

[6]  Stefan Arbanowski,et al.  Supervised Speech Act Classification of Messages in German Online Discussions , 2016, FLAIRS Conference.

[7]  Ayad Tareq Imam,et al.  Relative-Fuzzy: A Novel Approach for Handling Complex Ambiguity for Software Engineering of Data Mining Models , 2010 .

[8]  Chenliang Li,et al.  A Survey on Deep Learning for Named Entity Recognition , 2018, IEEE Transactions on Knowledge and Data Engineering.

[9]  Ayad Tareq Imam,et al.  An algorithmic approach to extract actions and actors (AAEAA) , 2018, ICGDA.

[10]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[11]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[12]  Tome Eftimov,et al.  A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations , 2017, PloS one.

[13]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[14]  Charles J. Fillmore,et al.  Types of Lexical Information , 1969 .

[15]  Tanupriya Choudhury,et al.  An efficient automated design to generate UML diagram from Natural Language Specifications , 2016, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence).

[16]  Vangelis Karkaletsis,et al.  Ontology Based Information Extraction from Text , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[17]  Chang Wook Ahn,et al.  A genetic algorithm for shortest path routing problem and the sizing of populations , 2002, IEEE Trans. Evol. Comput..

[18]  Priyanka More,et al.  Generating UML Diagrams from Natural Language Specifications , 2012 .

[19]  John C. Grundy,et al.  Rule-based extraction of goal-use case models from text , 2015, ESEC/SIGSOFT FSE.

[20]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[21]  Coen De Roover,et al.  A general method for rendering static analyses for diverse concurrency models modular , 2019, J. Syst. Softw..

[22]  Marinos G. Georgiades,et al.  Formalizing and Automating Use Case Model Development , 2012 .

[23]  Francis M. Tyers,et al.  Universal Dependencies , 2017, EACL.

[24]  Ayad Tareq Imam,et al.  The Use of Natural Language Processing Approach for Converting Pseudo Code to C# Code , 2019, J. Intell. Syst..

[25]  Aysh Alhroob,et al.  An Algorithmic Approach for Sketching Sequence Diagram (AASSD) , 2017 .

[26]  Wei Li,et al.  Automated transformation of design text ROM diagram into SysML models , 2016, Adv. Eng. Informatics.

[27]  Timothy Osborne,et al.  Constructions are catenae: Construction Grammar meets Dependency Grammar , 2012 .

[28]  Paul T. Groth,et al.  Open Information Extraction on Scientific Text: An Evaluation , 2018, COLING.

[29]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[30]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[31]  Wasif Nisar,et al.  Structured Language Requirement Elicitation Using Case Base Reasoning , 2013 .

[32]  Muhammad Ali Babar,et al.  An Automated Tool for Generating UML Models from Natural Language Requirements , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[33]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[34]  Lynn M. Berk English Syntax: From Word to Discourse , 1999 .

[35]  Aysh Alhroob,et al.  The Definition of Intelligent Computer Aided Software Engineering (I-CASE) Tools , 2015 .

[36]  Aysh Alhroob,et al.  The use of artificial neural networks for extracting actions and actors from requirements document , 2018, Inf. Softw. Technol..

[37]  Kanad K. Biswas,et al.  From natural language requirements to UML class diagrams , 2015, 2015 IEEE Second International Workshop on Artificial Intelligence for Requirements Engineering (AIRE).

[38]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[39]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[40]  Kadri Hacioglu,et al.  Semantic Role Labeling Using Dependency Trees , 2004, COLING.

[41]  Marcin Junczys-Dowmunt,et al.  A Genetic Programming Experiment in Natural Language Grammar Engineering , 2012, TSD.

[42]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[43]  Nabil Arman,et al.  Generating Use Case Models from Arabic User Requirements in a Semiautomated Approach Using a Natural Language Processing Tool , 2015, J. Intell. Syst..