Text analysis for requirements engineering

Requirements Engineering is the Achilles’ heel of the whole software development process. It involves many stakeholders and includes not only technical but also sociological and psychological activities. Even when all the stakeholders come to a consensus, the produced requirements are rather informal. In the early project phases the functionality of the prospective software is not yet understood in the precision necessary for formalization, which makes requirements formalization not only a refinement, but also a learning process. As the survey by Mich et al. [MFN04] shows, the overwhelming majority of requirements are written in natural language. In practice these documents are often vague and contain a lot of ambiguities, which causes misunderstandings between project stakeholders. Misunderstandings and errors of the requirements engineering phase propagate to later development phases and can potentially lead to a project failure. To alleviate misunderstanding and to support the step from informal requirements to a formal model this thesis proposes a novel approach to the extraction of a domain ontology from requirements documents in order to establish a common language for the project stakeholders. An ontology consists of a set of terms and relations between these terms. As compared to a glossary, a domain-specific ontology gives a more explicit definition of terms and relations between them. When the ontology is extracted, a domain expert validates it. The validated ontology becomes both the common language for all the project stakeholders and a valuable resource for later development steps. The thesis makes two key contributions to ontology extraction as a part of requirements analysis: • It implements a semiautomatic method, extracting an ontology from a requirements document and validating the extracted ontology. • It shows how traditional requirements analysis process should be modified to include ontology extraction and validation. The feasibility of the proposed approach was evaluated on three comprehensive case studies.

[1]  Russ Abbott Program design by informal English descriptions , 1983, CACM.

[2]  Stefania Gnesi,et al.  The linguistic approach to the natural language requirements quality: benefit of the use of an automatic tool , 2001, Proceedings 26th Annual NASA Goddard Software Engineering Workshop.

[3]  R GruberThomas Toward principles for the design of ontologies used for knowledge sharing , 1995 .

[4]  Daniel M. Berry,et al.  AbstFinder, A Prototype Natural Language Text Abstraction Finder for Use in Requirements Elicitation , 1997, Automated Software Engineering.

[5]  Camille Ben Achour,et al.  Linguistic Instruments for the Integration of Scenarios in Requirement Engineering , 1997 .

[6]  Barry W. Boehm,et al.  EasyWinWin: a groupware-supported methodology for requirements negotiation , 2001, ESEC/FSE-9.

[7]  Barbara Paech,et al.  Detecting Ambiguities in Requirements Documents Using Inspections , 2001 .

[8]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[9]  Karin K. Breitman,et al.  Ontology as a requirements engineering product , 2003, Proceedings. 11th IEEE International Requirements Engineering Conference, 2003..

[10]  Cliff B. Jones,et al.  Systematic software development using VDM , 1986, Prentice Hall International Series in Computer Science.

[11]  Michael Jackson,et al.  Four dark corners of requirements engineering , 1997, TSEM.

[12]  Egon Börger,et al.  The Stream Boiler Case Study: Competition of Formal Program Specification and Development Methods , 1995, Formal Methods for Industrial Applications.

[13]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[14]  Steffen Staab,et al.  Measuring Similarity between Ontologies , 2002, EKAW.

[15]  Sophia Ananiadou,et al.  Automatic Discovery of Term Similarities Using Pattern Mining , 2002, COLING-02 on COMPUTERM 2002 second international workshop on computational terminology -.

[16]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[17]  Axel van Lamsweerde,et al.  Goal-Oriented Requirements Engineering: A Guided Tour , 2001, RE.

[18]  Vincenzo Gervasi,et al.  Experiences with Domain-Based Parsing of Natural Language Requirements , 1999 .

[19]  Vincenzo Gervasi Synthesizing ASMs from natural language requirements , 2001 .

[20]  Sunil Vadera,et al.  From English to Formal Specifications , 1994, Comput. J..

[21]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  Bernhard Schätz,et al.  Consistent Graphical Specification of Distributed Systems , 1997, FME.

[24]  Vincenzo Gervasi,et al.  The Circe approach to the systematic analysis of NL requirements , 2003 .

[25]  Leonid Kof Using application domain ontology to construct an initial system model , 2004, IASTED Conf. on Software Engineering.

[26]  Colette Rolland,et al.  Guiding the Construction of Textual Use Case Specifications , 1998, Data Knowl. Eng..

[27]  Peter P. Chen English Sentence Structure and Entity-Relationship Diagrams , 1983, Inf. Sci..

[28]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[29]  Michael Kohlhase,et al.  Inference and Computational Semantics , 2004, J. Log. Lang. Inf..

[30]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[31]  Pamela Zave,et al.  Classification of research efforts in requirements engineering , 1995, Proceedings of 1995 IEEE International Symposium on Requirements Engineering (RE'95).

[32]  Renaud Lecoeuche Finding comparatively important concepts between texts , 2000, Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.

[33]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[34]  Suzanne Robertson,et al.  Mastering the Requirements Process , 1999 .

[35]  Leonid Kof,et al.  Validating Documentation with Domain Ontologies , 2005, SoMeT.

[36]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[37]  Norbert E. Fuchs,et al.  Attempto Controlled English (ACE)Language ManualVersion 3.0 , 1999 .

[38]  Steffen Staab,et al.  Discovering Conceptual Relations from Text , 2000, ECAI.

[39]  Y. S. Maarek,et al.  The use of lexical affinities in requirements extraction , 1989, IWSSD '89.

[40]  Egon Börger,et al.  Formal methods for industrial applications : specifying and programming the steam boiler control , 1996 .

[41]  Hajime Enomoto,et al.  Software development process from natural language specification , 1989, ICSE '89.

[42]  Luisa Mich,et al.  Market research for requirements analysis using linguistic tools , 2004, Requirements Engineering.

[43]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[44]  Judita Preiss Choosing a Parser for Anaphora Resolution , 2002 .

[45]  Leonid Kof,et al.  An application of natural language processing to domain modelling: two case studies , 2005, Comput. Syst. Sci. Eng..

[46]  L. Kof NATURAL LANGUAGE PROCESSING FOR REQUIREMENTS ENGINEERING : APPLICABILITY TO LARGE REQUIREMENTS DOCUMENTS , 2004 .

[47]  Leonid Kof,et al.  Natural Language Processing: Mature Enough for Requirements Documents Analysis? , 2005, NLDB.

[48]  Bram van der Vos,et al.  NL Structures and Conceptual Modelling: Grammalizing for KISS , 1997, Data Knowl. Eng..

[49]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[50]  Hideki Mima,et al.  The ATRACT Workbench: Automatic Term Recognition and Clustering for Terms , 2001, TSD.

[51]  Björn Regnell,et al.  A Feasibility Study of Automated Natural Language Requirements Analysis in Market-Driven Development , 2002, Requirements Engineering.

[52]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[53]  David Faure,et al.  ASIUM: Learning subcategorization frames and restrictions of se-18 lection , 1998 .