Model elements identification using neural networks: a comprehensive study

Modeling of natural language requirements, especially for a large system, can take a significant amount of effort and time. Many automated model-driven approaches partially address this problem. However, the application of state-of-the-art neural network architectures to automated model element identification tasks has not been studied. In this paper, we perform an empirical study on automatic model elements identification for component state transition models from use case documents. We analyzed four different neural network architectures: feed forward neural network, convolutional neural network, recurrent neural network (RNN) with long short-term memory, and RNN with gated recurrent unit (GRU), and the trade-offs among them using six use case documents. We analyzed the effect of factors such as types of splitting, types of predictions, types of designs, and types of annotations on performance of neural networks. The results of neural networks on the test and unseen data showed that RNN with GRU is the most effective neural network architecture. However, the factors that result in effective predictions of neural networks are dependent on the type of the model element.

[1]  Ratna Sanyal,et al.  Automatic Extraction of Structural Model from Semi Structured Software Requirement Specification , 2018, 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS).

[2]  Atul Gupta,et al.  AnModeler: A tool for generating domain models from textual specifications , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Stephan Diehl,et al.  Sketches and diagrams in practice , 2014, Software Engineering & Management.

[4]  Yue Zhang,et al.  Automatic early defects detection in use case documents , 2014, ASE.

[5]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6]  Yutao Ma,et al.  Mining Domain Knowledge on Service Goals from Textual Service Descriptions , 2020, IEEE Transactions on Services Computing.

[7]  Hyunsook Do,et al.  A Combinatorial Approach for Exposing Off-Nominal Behaviors , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[8]  Hans van Halteren Chunking with WPDV Models , 2000, CoNLL/LLL.

[9]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[10]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[11]  Raja Touahni,et al.  Automatic Transformation of User Stories into UML Use Case Diagrams using NLP Techniques , 2018, ANT/SEIT.

[12]  Robert E. Tillman,et al.  Structure learning with independent non-identically distributed data , 2009, ICML '09.

[13]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[14]  Danilo P. Mandic,et al.  Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability , 2001 .

[15]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[16]  Hui Song,et al.  Instant and incremental QVT transformation for runtime models , 2011, MODELS'11.

[17]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[18]  John Mylopoulos,et al.  GaiusT: supporting the extraction of rights and obligations for regulatory compliance , 2013, Requirements Engineering.

[19]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[20]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[21]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[22]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[23]  Annie I. Antón,et al.  Goal-based requirements analysis , 1996, Proceedings of the Second International Conference on Requirements Engineering.

[24]  Hyunsook Do,et al.  Automated Identification of Component State Transition Model Elements from Requirements , 2017, 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW).

[25]  Elina Kalnina,et al.  Comprehensive System for Systematic Case-Driven Software Reuse , 2010, SOFSEM.

[26]  Sjaak Brinkkemper,et al.  Forging high-quality User Stories: Towards a discipline for Agile Requirements , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[27]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[28]  Walter Daelemans,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.

[29]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[30]  Girish Chavan,et al.  NOBLE – Flexible concept recognition for large-scale biomedical natural language processing , 2016, BMC Bioinformatics.

[31]  James Pustejovsky,et al.  Natural Language Annotation for Machine Learning - a Guide to Corpus-Building for Applications , 2012 .

[32]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[33]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[34]  Petra Saskia Bayerl,et al.  What Determines Inter-Coder Agreement in Manual Annotations? A Meta-Analytic Investigation , 2011, CL.

[35]  Robert J. Gaizauskas,et al.  CM-Builder: A Natural Language-Based CASE Tool for Object-Oriented Analysis , 2003, Automated Software Engineering.

[36]  Sjaak Brinkkemper,et al.  Automated Extraction of Conceptual Models from User Stories via NLP , 2016, 2016 IEEE 24th International Requirements Engineering Conference (RE).

[37]  Danilo P. Mandic,et al.  Recurrent Neural Networks for Prediction , 2001 .

[38]  Sjaak Brinkkemper,et al.  Detecting terminological ambiguity in user stories: Tool and experimentation , 2019, Inf. Softw. Technol..

[39]  Josef van Genabith,et al.  Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation , 2008, COLING 2008.

[40]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[41]  David Yarowsky,et al.  Techniques in Speech Acoustics , 1999, Computational Linguistics.

[42]  Zengchang Qin,et al.  Question Classification using Head Words and their Hypernyms , 2008, EMNLP.

[43]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[44]  Jean Véronis,et al.  A study of polysemy judgements and inter-annotator agreement , 1999 .

[45]  Raja Touahni,et al.  Automatic generation of UML sequence diagrams from user stories in Scrum process , 2015, 2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA).

[46]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[47]  Mehrdad Sabetzadeh,et al.  Automated Extraction of Semantic Legal Metadata using Natural Language Processing , 2018, 2018 IEEE 26th International Requirements Engineering Conference (RE).

[48]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[49]  Eliane Martins,et al.  MARITACA: From Textual Use Case Descriptions to Behavior Models , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).

[50]  Arie Ben-David,et al.  About the relationship between ROC curves and Cohen's kappa , 2008, Eng. Appl. Artif. Intell..

[51]  Yuanpeng Wang,et al.  T-Star: A Text-Based iStar Modeling Tool , 2019, 2019 IEEE 27th International Requirements Engineering Conference (RE).

[52]  Shuohang Wang,et al.  Learning Natural Language Inference with LSTM , 2015, NAACL.

[53]  Eric S. K. Yu,et al.  Interactive goal model analysis for early requirements engineering , 2014, Requirements Engineering.

[54]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[55]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[56]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[57]  Gregg D. Wilensky,et al.  Neural Network Studies , 1993 .

[58]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[59]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[60]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[61]  Peter P. Chen English Sentence Structure and Entity-Relationship Diagrams , 1983, Inf. Sci..

[62]  Bonnie Webber,et al.  Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers , 2017 .

[63]  Sjaak Brinkkemper,et al.  Extracting conceptual models from user stories with Visual Narrator , 2017, Requirements Engineering.

[64]  Clémentine Nebut,et al.  Visualization of Use Cases through Automatically Generated Activity Diagrams , 2008, MoDELS.

[65]  Jane Willis,et al.  Pattern Grammar: A Corpus-driven Approach to the Lexical Grammar of English. , 2002 .

[66]  Alexander Verbraeck,et al.  User requirements modeling and analysis of software-intensive systems , 2011, J. Syst. Softw..

[67]  Clémentine Nebut,et al.  Model-Driven Engineering for Requirements Analysis , 2007, 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007).

[68]  Igor V. Tetko,et al.  Neural network studies, 1. Comparison of overfitting and overtraining , 1995, J. Chem. Inf. Comput. Sci..

[69]  Lionel C. Briand,et al.  aToucan: An Automated Framework to Derive UML Analysis Models from Use Case Models , 2015, TSEM.

[70]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[71]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[72]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[73]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[74]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[75]  Zhiyong Lu,et al.  Challenges in clinical natural language processing for automated disorder normalization , 2015, J. Biomed. Informatics.

[76]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Alexander J. Smola,et al.  Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data , 2017, ICML.

[78]  Bin Liu,et al.  Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning , 2017, Inf. Softw. Technol..

[79]  John Mylopoulos,et al.  Holistic security requirements analysis for socio-technical systems , 2016, Software & Systems Modeling.

[80]  Renée C. Bryce,et al.  Finding Component State Transition Model Elements Using Neural Networks: An Empirical Study , 2018, 2018 5th International Workshop on Artificial Intelligence for Requirements Engineering (AIRE).

[81]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[82]  Nazlia Omar,et al.  Heuristic-based entity-relationship modelling through natural language processing , 2004 .

[83]  S. Abirami,et al.  Conceptual modeling of natural language functional requirements , 2014, J. Syst. Softw..

[84]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[85]  Tin Wee Tan,et al.  APBioNet—Transforming Bioinformatics in the Asia-Pacific Region , 2013, PLoS Comput. Biol..

[86]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[87]  Michal Smialek,et al.  Facilitating transition from requirements to code with the ReDSeeDS tool , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[88]  Sara Jones,et al.  Model-Driven Requirements Engineering: Synchronising Models in an Air Traffic Management Case Study , 2004, CAiSE.

[89]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[90]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[91]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[92]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[93]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[94]  John Mylopoulos,et al.  Goal Models for Acceptance Requirements Analysis and Gamification Design , 2017, ER.

[95]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[96]  Oscar Pastor,et al.  Assessing the Performance of Automated Model Extraction Rules , 2017, ISD.

[97]  John Mylopoulos,et al.  From object-oriented to goal-oriented requirements analysis , 1999, CACM.

[98]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[99]  Hyunsook Do,et al.  Exposing the susceptibility of off-nominal behaviors in reactive system requirements , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[100]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[101]  Sjaak Brinkkemper,et al.  The Interactive Narrator Tool: Effective Requirements Exploration and Discussion through Visualization , 2018, REFSQ Workshops.

[102]  Andreas Vogelsang,et al.  Extraction of System States from Natural Language Requirements , 2019, 2019 IEEE 27th International Requirements Engineering Conference (RE).