论文信息 - Natural Language Processing with Process Models (NLP4RE Report Paper)

Natural Language Processing with Process Models (NLP4RE Report Paper)

This paper is a report paper that focuses on research at the intersection of business process management and requirements engineering. It gives an overview of the research on natural language processing with process models organized in terms of 25 challenges. This research line is pursued in a cross-university collaboration between the authors and further colleagues. We describe the most important contributions of the authors and highlight directions for future research. 1 Team Overview The research team has a track record of joint work in the area of natural language processing with process models of over ten years. Since several members have changed affiliation, the collaboration has evolved towards a virtual research team. The team works in the area of business process management [DRMR18] and conducts research on the analysis of business process models including process model verification, refactoring, change propagation, matching, process mining, conformance checking, guidelines, and human comprehension of process models. Business process management according to our tradition strongly builds on research into the design of workflow systems in the 1990s and the configuration of ERP Systems in the 2000s by the help of business process models. It is in line with requirements engineering in its ambition of understanding the application domain, operational constraints, and functionality needed by stakeholders [Som05] and more specific with its focus on a special class of systems, namely systems that support an organization to execute their business processes. 2 Past Research on NLP for Requirements Engineering with Process Models In order to organize our previous research in the area of NLP for Requirements Engineering with a specific focus on process models, we have developed a framework that includes a list of 25 challenges. These challenges are associated with integrating requirements as a process model more efficiently, validating their correctness, completeness and consistency, and extracting information to support the design and implementation of a system that supports the execution of the business process. The 25 challenges can be organized into three major Copyright c © 2019 by the paper’s authors. Copying permitted for private and academic purposes. categories as Figure 1 illustrates: challenges in relation to automatically processing labels (C1-C7), in relation to labels in process models (C8-C19), and in relation to overall repositories (C20-C25) [MLP14]. Various of these challenges have been addressed by our research and also by other research teams. In the following, we discuss a selection of our works in order to illustrate the spectrum of contributions that have been made in this area of research. Several of these works have been published in renowned journals including IEEE Transactions on Software Engineering, Information & Software Technology, Decision Support Systems, and Information Systems. The initial spark for this research was laid by the observation that the textual labels of process models can be formulated in a good and bad way. This observation provided the motivation for utilizing natural language processing techniques to improve the text labels of process model. Such a technique can be understood as a specific type of refactoring of process models with the aim to make them easier to understand by humans. Towards this end, we developed a technique to identify different styles of labels automatically [LSM11] and guideline violations [LEM13], based on which we could then refactor them [LSM12]. Recently, we developed a novel label parsing techniques, which can be used to better address the aforementioned use cases [LvdAOR19]. With these works, we addressed the Challenges C1 and C2. This foundational set of techniques was then further extended into different directions. Most notable are translation, semantic processing, and conformance checking between process model and text, as discussed next. 2.1 Translations between Process Models and Text An important question for processing of text and models is to which extent automatic translations are feasible. We addressed this question in both directions: from text to process model and from process model to text. Our research on the translation from text to process model [FMP11] addresses various challenges that we organize in four categories. The first category, Syntactic Leeway, includes problems that stem from changing active and passive voice of input text, potential rewording and changes of order and conditions that are not explicit. The second category, Atomicity, refers to the fact that sentences can be as complex as whole model fragments, that activities can be split across sentences and that relative clauses have to be dealt with. The third category, Relevance, acknowledges that relative clauses, example sentences or meta-statements should not lead to model elements. The fourth category, Referencing, deals with anaphora, textual links and end-of-block recognition. The proposed translation technique works from the sentence level to the text level and creates a process model automatically. Using a test set of 47 text-model pairs, we achieve an average translation accuracy of 77%. This work has been recently extended with a structural analysis of the texts and an analysis of sentence templates in order to address potential issues of ambiguity [STW18] and is currently being integrated into a service-oriented architecture for the generation of process-oriented text. Our complementary research on the translation from process model to text for validation purposes [LMP14] addresses various challenges that stem from parsing the formal structure of the process model. More specifically, we distinguish four categories of challenges. The first category, Text Planning, deals with linguistic information extraction, model linearization and text structuring. The second category, Sentence Planning, includes lexicalization and message refinement. The third category, Surface Realization, relates to interfacing with established realizers. The fourth category, Flexibility, addresses variations of input data and adaptation of output. The proposed translation technique starts with information extraction from process model elements to graph parsing the process model into the refined process structure tree and text structuring based on the tree fragments. This data is fed into a deep syntax tree where a technique for message refinement is applied. Finally, a realizer generates the resulting natural language text. Our evaluation demonstrates that the generated texts are highly accurate and that a back translation hardly entails any loss of information. 2.2 Semantic Processing of Process Models and Text Each of these translation techniques takes the textual content as given. This is problematic, because terms are often ambiguous. This is the starting point of our research on the automatic detection and resolution of lexical ambiguity in process models [PLM15]. The corresponding technique covers homonym detection and resolution as much as synomym detection and resolution. The technique is evaluated using a collection of more than 2,000 process models from practice with altogether more than 20,000 text labels. The evaluation indicates that homonymous usage of terms like application, case or incident, as well as synonymous word pairs such as checkcontrol, create-produce, and customer-client are found. Automatic resolution significantly reduces ambiguity. A key problem of processing text labels of models in practice is that practitioners often do not use these labels in a canonical way. Examples are activity labels like Screen delivery documents if necessary or update Read Label Read Label verb obj C1: Identify Label Grammar C2: Refactor Label Grammar Label Reading Read Label C3: Disambiguate Label Terms Call Bank Call Bank (Financials) C4: Refactor Label Terms Call Bank Contact Financial Institution C5: Auto-Complete Label

Jan Mendling | Henrik Leopold | Lucinéia Heloisa Thom | Han van der Aa

[1] Jan Mendling,et al. 25 Challenges of Semantic Process Modeling , 2014 .

[2] Josep Carmona,et al. Aligning textual and model-based process descriptions , 2018, Data Knowl. Eng..

[3] Ian Sommerville,et al. Integrated requirements engineering: a tutorial , 2005, IEEE Software.

[4] Jan Mendling,et al. On the refactoring of activity labels in business process models , 2012, Inf. Syst..

[5] Jan Mendling,et al. Challenges of smart business process management: An introduction to the special issue , 2017, Decis. Support Syst..

[6] Hajo A. Reijers,et al. Comparing textual descriptions to process models - The automatic detection of inconsistencies , 2017, Inf. Syst..

[7] Remco M. Dijkman,et al. Report: The Process Model Matching Contest 2013 , 2013, Business Process Management Workshops.

[8] Jan Mendling,et al. Automatic Detection and Resolution of Lexical Ambiguity in Process Models , 2015, IEEE Transactions on Software Engineering.

[9] Jan Mendling,et al. Recognising Activity Labeling Styles in Business Process Models , 2011, Enterp. Model. Inf. Syst. Archit. Int. J. Concept. Model..

[10] Hajo A. Reijers,et al. Checking process compliance against natural language specifications using behavioral spaces , 2018, Inf. Syst..

[11] Jan Mendling,et al. Supporting Process Model Validation through Natural Language Generation , 2014, IEEE Transactions on Software Engineering.

[12] Hajo A. Reijers,et al. Using Hidden Markov Models for the accurate linguistic analysis of process model activity labels , 2019, Inf. Syst..

[13] Inge van de Weerd,et al. Causes and Consequences of Fragmented Process Information: Insights from a Case Study , 2017, AMCIS.

[14] Leonardo Guerreiro Azevedo,et al. Detection of naming convention violations in process models for different languages , 2013, Decis. Support Syst..

[15] Jan Mendling,et al. Searching textual and model-based process descriptions based on a unified data format , 2017, Software & Systems Modeling.

[16] Jan Mendling,et al. Process Model Generation from Natural Language Text , 2011, CAiSE.

[17] Marcelo Fantinato,et al. Empirical Analysis of Sentence Templates and Ambiguity Issues for Business Process Descriptions , 2018, OTM Conferences.

[18] Hajo A. Reijers,et al. Extracting Declarative Process Models from Natural Language , 2019, CAiSE.

[19] Jan Mendling,et al. An experiment on an ontology-based support approach for process modeling , 2017, Inf. Softw. Technol..

[20] Jan Mendling,et al. Ensuring the canonicity of process models , 2017, Data Knowl. Eng..