论文信息 - Language engineering for the recovery of requirements from legacy documents

Language engineering for the recovery of requirements from legacy documents

Legacy documents, such as requirements documents or manuals of business procedures, can sometimes offer an important resource for informing what features of legacy software are redundant, need to be retained or can be reused. This situation is particularly acute where business change has resulted in the dissipation of human knowledge through staff turnover or redeployment. Exploiting legacy documents poses formidable problems, however, since they are often incomplete, poorly structured, poorly maintained and voluminous. This report proposes that language engineering using tools that exploit probabilistic natural language processing (NLP) techniques offer the potential to ease these problems. Such tools are available, mature and have been proven in other domains. The document provides a review of NLP and a discussion of the components of probabilistic NLP techniques and their potential for requirements recovery from legacy documents. The report concludes with a summary of the preliminary results of the adaptation and application of these techniques in the REVERE project.

[1] G. Leech,et al. Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus , 1997 .

[2] Michael Halliday,et al. Cohesion in English , 1976 .

[3] Roger Garside. The robust tagging of unrestricted text: the BNC experience , 1996 .

[4] Martin Loomes,et al. Requirements evolution in the midst of environmental change: a managed approach , 1998, Proceedings of the Second Euromicro Conference on Software Maintenance and Reengineering.

[5] Terry Winograd,et al. Language as a Cognitive Process , 1983, CL.

[6] Ian Sommerville,et al. Managing Process Inconsistency Using Viewpoints , 1999, IEEE Trans. Software Eng..

[7] Geoffrey Leech,et al. Using corpora for language research : studies in the honour of Geoffrey Leech , 1996 .

[8] Paul Rayson,et al. Template analysis: bridging the gap between grammar and the lexicon , 1996 .

[9] Sylviane Granger,et al. Automatic Profiling of Learner Texts , 1998 .

[10] Paul Rayson,et al. The ACAMRIT semantic tagging system: progress report , 1996 .

[11] John Sinclair,et al. Corpus, Concordance, Collocation , 1991 .

[12] Richard Jones. Creating and using a corpus of spoken German , 1997 .

[13] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[14] Kevin Ryan,et al. The role of natural language in requirements engineering , 1993, [1993] Proceedings of the IEEE International Symposium on Requirements Engineering.

[15] Geoffrey Leech,et al. CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[16] Richard Kittredge,et al. Sublanguage : studies of language in restricted semantic domains , 1982 .

[17] Julio Gonzalo,et al. Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[18] Naomi Sager,et al. Chapter 2. Automatic Information Formatting of a Medical Sublanguage , 1982 .

[19] John B. Carroll,et al. The American Heritage Word Frequency Book , 1971 .

[20] Alphonse G. Juilland,et al. Frequency dictionary of French words , 1971 .

[21] I D Bross,et al. How information is carried in scientific sub-languages. , 1972, Science.

[22] Ian Marshall,et al. Choice of grammatical word-class without global syntactic analysis: Tagging words in the lob corpus , 1983, Comput. Humanit..

[23] Sylviane Granger,et al. Learner English on Computer , 1998 .

[24] S. Fligelstone,et al. Developing a scheme for annotating text to show anaphoric relations , 1992 .

[25] Penelope Sibun,et al. A Practical Part-of-Speech Tagger , 1992, ANLP.

[26] Heles Contreras,et al. Frequency Dictionary of Spanish Words , 1964 .