A Deep Context-wise Method for Coreference Detection in Natural Language Requirements

Requirements are usually written by different stakeholders with diverse backgrounds and skills and evolve continuously. Therefore inconsistency caused by specialized jargons and different domains, is inevitable. In particular, entity coreference in Requirement Engineering (RE) is that different linguistic expressions refer to the same real-world entity. It leads to misconception about technical terminologies, and impacts the readability and understandability of requirements negatively. Manual detection entity coreference is labor-intensive and time-consuming. In this paper, we propose a DEEP context-wise semantic method named DeepCoref to entity COREFerence detection. It consists of one fine-tuning BERT model for context representation and a Word2Vec-based network for entity representation. We use a multi-layer perception in the end to fuse and make a trade-off between two representations for obtaining a better representation of entities. The input of the network is requirement contextual text and related entities, and the output is the predictive label to infer whether two entities are coreferent. The evaluation on industry data shows that our approach significantly outperforms three baselines with average precision and recall of 96.10% and 96.06% respectively. We also compare DeepCoref with three variants to demonstrate the performance enhancement from different components.

[1]  Xiaoyong Du,et al.  Analogical Reasoning on Chinese Morphological and Semantic Relations , 2018, ACL.

[2]  Bernhard Schätz,et al.  Can clone detection support quality assessments of requirements specifications? , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Ido Dagan,et al.  Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution , 2019, ACL.

[4]  Stefan Wagner,et al.  Rapid quality assurance with Requirements Smells , 2016, J. Syst. Softw..

[5]  Cleo Sgouropoulou,et al.  PBURC: a patterns-based, unsupervised requirements clustering framework for distributed agile software development , 2013, Requirements Engineering.

[6]  Erik Kamsties,et al.  The Syntactically Dangerous All and Plural in Specifications , 2005, IEEE Softw..

[7]  Luke S. Zettlemoyer,et al.  End-to-end Neural Coreference Resolution , 2017, EMNLP.

[8]  Li Zhang,et al.  Mining Requirements Knowledge from Collections of Domain Documents , 2016, 2016 IEEE 24th International Requirements Engineering Conference (RE).

[9]  Henning Femmer,et al.  On the impact of passive voice requirements on domain modelling , 2014, ESEM '14.

[10]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[11]  Heeyoung Lee,et al.  Joint Entity and Event Coreference Resolution across Documents , 2012, EMNLP.

[12]  Daniel M. Berry,et al.  The Design of SREE - A Prototype Potential Ambiguity Finder for Requirements Specifications and Lessons Learned , 2013, REFSQ.

[13]  Stefania Gnesi,et al.  Using NLP to Detect Requirements Defects: An Industrial Experience in the Railway Domain , 2017, REFSQ.

[14]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[15]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[16]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[17]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[18]  Wael Hassan Gomaa,et al.  A Survey of Text Similarity Approaches , 2013 .

[19]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[20]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[21]  Siddharth Patwardhan,et al.  The Role of Context Types and Dimensionality in Learning Word Embeddings , 2016, NAACL.

[22]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Gerardo Canfora,et al.  Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques , 2013, IEEE Transactions on Software Engineering.

[25]  Andrian Marcus,et al.  Using latent semantic analysis to identify similarities in source code to support program understanding , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[26]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[27]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Jing Lu,et al.  Joint Learning for Event Coreference Resolution , 2017, ACL.

[29]  Claudia A. Marcos,et al.  Identifying duplicate functionality in textual use cases by aligning semantic actions , 2014, Software & Systems Modeling.

[30]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[31]  Omer Levy,et al.  BERT for Coreference Resolution: Baselines and Analysis , 2019, EMNLP/IJCNLP.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Roel Wieringa,et al.  Naming the pain in requirements engineering , 2016, Empirical Software Engineering.

[35]  Andrea Esuli,et al.  An NLP approach for cross-domain ambiguity detection in requirements engineering , 2019, Automated Software Engineering.

[36]  Benedikt Gleich,et al.  Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources , 2010, REFSQ.

[37]  Stefania Gnesi,et al.  The linguistic approach to the natural language requirements quality: benefit of the use of an automatic tool , 2001, Proceedings 26th Annual NASA Goddard Software Engineering Workshop.

[38]  Francis Chantree,et al.  Identifying Nocuous Ambiguities in Natural Language Requirements , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[39]  Bashar Nuseibeh,et al.  Analysing anaphoric ambiguity in natural language requirements , 2011, Requirements Engineering.

[40]  Andreas Vogelsang,et al.  Automatic Glossary Term Extraction from Large-Scale Requirements Specifications , 2018, 2018 IEEE 26th International Requirements Engineering Conference (RE).

[41]  Walid Maalej,et al.  SAFE: A Simple Approach for Feature Extraction from App Descriptions and App Reviews , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[42]  Roel Wieringa,et al.  Naming the pain in requirements engineering , 2016, Empirical Software Engineering.

[43]  Hui Liu,et al.  Enhancing Automated Requirements Traceability by Resolving Polysemy , 2018, 2018 IEEE 26th International Requirements Engineering Conference (RE).

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Ming-Wei Chang,et al.  Zero-Shot Entity Linking by Reading Entity Descriptions , 2019, ACL.

[46]  Srikumar Venugopal,et al.  A systematic review and comparative analysis of cross-document coreference resolution methods and tools , 2016, Computing.

[47]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[48]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[51]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[52]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[53]  Jane Cleland-Huang,et al.  Mining Domain Knowledge [Requirements] , 2015, IEEE Software.

[54]  Florence Sèdes,et al.  Industrial Requirements Classification for Redundancy and Inconsistency Detection in SEMIOS , 2018, 2018 IEEE 26th International Requirements Engineering Conference (RE).

[55]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[56]  Nan Niu,et al.  On the role of semantics in automated requirements tracing , 2014, Requirements Engineering.

[57]  Mehrdad Sabetzadeh,et al.  Automated Extraction and Clustering of Requirements Glossary Terms , 2017, IEEE Transactions on Software Engineering.