Extraction of Useful Information from Unstructured Data in Software Engineering: A Systematic Mapping

Context: A large number of information is generated and manipulated in Software Engineering (SE) projects. The technology surrounding this domain is constantly evolving. To keep up with such evolution, developers share their knowledge and seek help from other developers by means of interactive and collaborative environments. Understanding and extracting knowledge from these environments can enable developers to identify useful information for the project. Objective: This work aims to identify the main textual analysis approaches to extract useful information in the SE. Method: To achieve the proposed objective, we conducted a Systematic Mapping (SM). Results: We analyzed 69 relevant primary studies addressing approaches to extract useful information in the SE. Conclusion: Among the main conclusions of this study, we can infer that discussion forums attracted a significantly attention in SE context and it becomes one of the main textual databases investigated to extract useful information.

[1]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[2]  Dietmar Pfahl,et al.  Evaluating and Improving Software Quality Using Text Analysis Techniques - A Mapping Study , 2016, REFSQ Workshops.

[3]  Mingwei Liu,et al.  Searching StackOverflow Questions with Multi-Faceted Categorization , 2018, Internetware.

[4]  Nakornthip Prompoon,et al.  Social clues powered, personalized software engineering messages classification , 2010, 2010 10th International Symposium on Communications and Information Technologies.

[5]  Michael Fuchs,et al.  Towards Cloud-Based Knowledge Capturing Based on Natural Language Processing , 2015, Cloud Forward.

[6]  Hans van Vliet Knowledge Sharing in Software Development , 2010, 2010 10th International Conference on Quality Software.

[7]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[8]  M. Lindvall,et al.  Knowledge management in software engineering , 2002, IEEE Software.

[9]  Dragan Gasevic,et al.  Decision support for the software product line domain engineering lifecycle , 2011, Automated Software Engineering.

[10]  Farooque Azam,et al.  A Comprehensive Investigation of BPMN Models Generation from Textual Requirements - Techniques, Tools and Trends , 2018, ICISA.

[11]  Yang Liu,et al.  Tell Them Apart: Distilling Technology Differences from Crowd-Scale Comparison Discussions , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[12]  Chong Feng,et al.  A survey on mining stack overflow: question and answering (Q&A) community , 2018, Data Technol. Appl..

[13]  Ricardo de Almeida Falbo,et al.  Integrating Knowledge Management and Groupware in a Software Development Environment , 2004, PAKM.

[14]  Guilherme H. Travassos,et al.  Integrating Verification and Validation Techniques Knowledge into Software Engineering Environments , 2004 .

[15]  Ricardo de Almeida Falbo,et al.  Knowledge management initiatives in software testing: A mapping study , 2015, Inf. Softw. Technol..

[16]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[17]  Collin McMillan,et al.  Automated feature discovery via sentence selection and source code summarization , 2016, J. Softw. Evol. Process..

[18]  Md. Rizwan Beg,et al.  Representation of Knowledge from Software Requirements Expressed in Natural Language , 2013, 2013 6th International Conference on Emerging Trends in Engineering and Technology.

[19]  Yingying Zhang,et al.  Extracting problematic API features from forum discussions , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[20]  Georg von Krogh,et al.  Perspective - Tacit Knowledge and Knowledge Conversion: Controversy and Advancement in Organizational Knowledge Creation Theory , 2009, Organ. Sci..

[21]  Vili Podgorelec,et al.  Enhanced Feature Selection Using Word Embeddings for Self-Admitted Technical Debt Identification , 2018, 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[22]  Syed Sibte Raza Abidi,et al.  Knowledge Sharing for Pediatric Pain Management via a Web 2.0 Framework , 2009, MIE.

[23]  Christoph Treude,et al.  Extracting Development Tasks to Navigate Software Documentation , 2015, IEEE Transactions on Software Engineering.

[24]  Zhenchang Xing,et al.  AnswerBot: Automated generation of answer summary to developers' technical questions , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Srividya Kona Bansal,et al.  Identifying Trends in Technologies and Programming Languages Using Topic Modeling , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[26]  Johannes Wachs,et al.  Gender differences in participation and reward on Stack Overflow , 2018, Empirical Software Engineering.

[27]  Wasi Haider Butt,et al.  A comprehensive investigation of natural language processing techniques and tools to generate automated test cases , 2017, ICC.

[28]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[29]  Marco Carvalho,et al.  Extracting Knowledge from Open Source Projects to Improve Program Security , 2018, SoutheastCon 2018.

[30]  Jorge E. Camargo,et al.  Predicting the Programming Language: Extracting Knowledge from Stack Overflow Posts , 2017 .

[31]  Anas Mahmoud,et al.  STAC: A tool for Static Textual Analysis of Code , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).