Extracting domain models from natural-language requirements: approach and industrial evaluation

Domain modeling is an important step in the transition from natural-language requirements to precise specifications. For large systems, building a domain model manually is a laborious task. Several approaches exist to assist engineers with this task, whereby candidate domain model elements are automatically extracted using Natural Language Processing (NLP). Despite the existing work on domain model extraction, important facets remain under-explored: (1) there is limited empirical evidence about the usefulness of existing extraction rules (heuristics) when applied in industrial settings; (2) existing extraction rules do not adequately exploit the natural-language dependencies detected by modern NLP technologies; and (3) an important class of rules developed by the information retrieval community for information extraction remains unutilized for building domain models. Motivated by addressing the above limitations, we develop a domain model extractor by bringing together existing extraction rules in the software engineering literature, extending these rules with complementary rules from the information retrieval literature, and proposing new rules to better exploit results obtained from modern NLP dependency parsers. We apply our model extractor to four industrial requirements documents, reporting on the frequency of different extraction rules being applied. We conduct an expert study over one of these documents, investigating the accuracy and overall effectiveness of our domain model extractor.

[1]  Vincenzo Gervasi,et al.  On the Systematic Analysis of Natural Language Requirements with CIRCE , 2006, Automated Software Engineering.

[2]  Kurt Schneider Structuring Knowledge for Reuse , 2009 .

[3]  Peter P. Chen English Sentence Structure and Entity-Relationship Diagrams , 1983, Inf. Sci..

[4]  Klaus Pohl,et al.  Requirements Engineering - Fundamentals, Principles, and Techniques , 2010 .

[5]  Ratna Sanyal,et al.  Semi-automatic generation of UML models from natural language requirements , 2011, ISEC.

[6]  Giuseppe Attardi,et al.  Chunking and Dependency Parsing , 2008 .

[7]  Simon Perry,et al.  Model-Based Requirements Engineering , 2011 .

[8]  Dong Liu,et al.  Automating transition from use-cases to class model , 2003, CCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436).

[9]  Nitin Indurkhya,et al.  Handbook of Natural Language Processing , 2010 .

[10]  Kurt Schneider,et al.  Experience and Knowledge Management in Software Engineering , 2009 .

[11]  Yue Zhang,et al.  Fast and Accurate Shift-Reduce Constituent Parsing , 2013, ACL.

[12]  Dong Liu,et al.  Natural language requirements analysis and class model generation using UCDA , 2004 .

[13]  Nenad Medvidovic,et al.  Reducing Ambiguities in Requirements Specifications Via Automatically Created Object-Oriented Models , 2008, Monterey Workshop.

[14]  Luisa Mich,et al.  NL-OOPS: from natural language to object oriented requirements using the natural language processing system LOLITA , 1996, Natural Language Engineering.

[15]  Lionel C. Briand,et al.  A systematic review of transformation approaches between user requirements and analysis models , 2011, Requirements Engineering.

[16]  Lionel C. Briand,et al.  aToucan: An Automated Framework to Derive UML Analysis Models from Use Case Models , 2015, TSEM.

[17]  Robert J. Gaizauskas,et al.  CM-Builder: A Natural Language-Based CASE Tool for Object-Oriented Analysis , 2003, Automated Software Engineering.

[18]  Mehrdad Sabetzadeh,et al.  Automated Checking of Conformance to Requirements Templates Using Natural Language Processing , 2015, IEEE Transactions on Software Engineering.

[19]  Benno Geißelmann,et al.  Program Design by Informal English Descriptions , 2001 .

[20]  Scott W. Ambler,et al.  The Object Primer: Agile Model-Driven Development with UML 2.0 , 2004 .

[21]  Paul Vickers,et al.  Parsed use case descriptions as a basis for object-oriented class model generation , 2011, J. Syst. Softw..

[22]  A. Akbik,et al.  Wanderlust : Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns , 2009 .

[23]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[24]  Hongfei Lin,et al.  BioPPISVMExtractor: A protein-protein interaction extractor for biomedical literature using SVM and rich feature sets , 2010, J. Biomed. Informatics.

[25]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[26]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[27]  Craig Larman,et al.  Applying UML and patterns , 1997 .

[28]  Henda Hajjami Ben Ghézala,et al.  Automatic builder of class diagram (ABCD): an application of UML generation from functional requirements , 2016, Softw. Pract. Exp..

[29]  S. Abirami,et al.  Conceptual modeling of natural language functional requirements , 2014, J. Syst. Softw..

[30]  Noah A. Smith Linguistic Structure Prediction , 2011, Synthesis Lectures on Human Language Technologies.

[31]  Rodina Ahmad,et al.  Class Diagram Extraction from Textual Requirements Using Natural Language Processing (NLP) Techniques , 2010, 2010 Second International Conference on Computer Research and Development.

[32]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[33]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[34]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[35]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .