DPWord2Vec: Better Representation of Design Patterns in Semantics

With the plain text descriptions of design patterns, developers could better learn and understand the definitions and usage scenarios of design patterns. To facilitate the automatic usage of these descriptions, e.g., recommending design patterns by free-text queries, design patterns and natural languages should be adequately associated. Existing studies usually use texts in design pattern books as the representations of design patterns to calculate similarities with the queries. However, this way is problematic. Lots of information of design patterns may be absent from design pattern books and many words would be out of vocabulary due to the content limitation of these books. To overcome these issues, a more comprehensive method should be constructed to estimate the relatedness between design patterns and natural language words. Motivated by Word2Vec, in this study, we propose DPWord2Vec that embeds design patterns and natural language words into vectors simultaneously. We first build a corpus containing more than 400 thousand documents extracted from design pattern books, Wikipedia, and Stack Overflow. Next, we redefine the concept of context window to associate design patterns with words. Then, the design pattern and word vector representations are learnt by leveraging an advanced word embedding method. The learnt design pattern and word vectors can be universally used in textual description based design pattern tasks. An evaluation shows that DPWord2Vec outperforms the baseline algorithms by 24.2-120.9 percent in measuring the similarities between design patterns and words in terms of Spearman’s rank correlation coefficient. Moreover, we adopt DPWord2Vec on two typical design pattern tasks. In the design pattern tag recommendation task, the DPWord2Vec-based method outperforms two state-of-the-art algorithms by 6.6 and 32.7 percent respectively when considering <inline-formula><tex-math notation="LaTeX">$Recall@10$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>@</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="jiang-ieq1-3017336.gif"/></alternatives></inline-formula>. In the design pattern selection task, DPWord2Vec improves the existing methods by 6.5-70.7 percent in terms of MRR.

[1]  Dong Liu,et al.  How Are Design Patterns Concerned by Developers? , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[2]  Zhenchang Xing,et al.  Mining Likely Analogical APIs Across Third-Party Libraries via Large-Scale Unsupervised API Semantics Embedding , 2019, IEEE Transactions on Software Engineering.

[3]  Trong Duc Nguyen,et al.  Complementing global and local contexts in representing API descriptions to improve API retrieval tasks , 2018, ESEC/SIGSOFT FSE.

[4]  Xin Chen,et al.  Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding , 2018, IEEE Transactions on Software Engineering.

[5]  J. Grundy,et al.  FastTagRec: fast tag recommendation for software information sites , 2018, International Conference on Automated Software Engineering.

[6]  Uri Alon,et al.  A general path-based representation for predicting program properties , 2018, PLDI.

[7]  Shuvendu K. Lahiri,et al.  Code vectors: understanding programs through embedded abstracted symbolic traces , 2018, ESEC/SIGSOFT FSE.

[8]  Michele Risi,et al.  Detecting the Behavior of Design Patterns through Model Checking and Dynamic Analysis , 2018, ACM Trans. Softw. Eng. Methodol..

[9]  Nicole Novielli,et al.  Sentiment Polarity Detection for Software Development , 2017, Empirical Software Engineering.

[10]  Arif Ali Khan,et al.  Software design patterns classification and selection using text categorization approach , 2017, Appl. Soft Comput..

[11]  Trong Duc Nguyen,et al.  Exploring API Embedding for API Usages and Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[12]  Shmuel S. Tyszberowicz,et al.  UML Diagram Refinement (Focusing on Class-and Use Case Diagrams) , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[13]  Zhenchang Xing,et al.  Unsupervised Software-Specific Morphological Forms Inference from Informal Discussions , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[14]  Jane Cleland-Huang,et al.  Semantically Enhanced Software Traceability Using Deep Learning Techniques , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[15]  Jin Liu,et al.  Scalable tag recommendation for software information sites , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[16]  Jing Li,et al.  Learning to Extract API Mentions from Informal Natural Language Discussions , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[17]  Xin Xia,et al.  What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts , 2016, Journal of Computer Science and Technology.

[18]  Xiaochen Li,et al.  Query Expansion Based on Crowd Knowledge for Code Search , 2016, IEEE Transactions on Services Computing.

[19]  Emad Shihab,et al.  What are mobile developers asking about? A large scale study using stack overflow , 2016, Empirical Software Engineering.

[20]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[21]  Anas Mahmoud,et al.  Estimating Semantic Relatedness in Source Code , 2015, ACM Trans. Softw. Eng. Methodol..

[22]  Zhiyuan Liu,et al.  Category Enhanced Word Embedding , 2015, ArXiv.

[23]  Hong Zhu,et al.  On the Composability of Design Patterns , 2015, IEEE Transactions on Software Engineering.

[24]  Leonidas J. Guibas,et al.  Learning Program Embeddings to Propagate Feedback on Student Code , 2015, ICML.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  David Lo,et al.  EnTagRec++: An enhanced tag recommendation system for software information sites , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[27]  Stephen W. Thomas,et al.  What are developers talking about? An analysis of topics and trends in Stack Overflow , 2014, Empirical Software Engineering.

[28]  Ying Zou,et al.  Spotting working code examples , 2014, ICSE.

[29]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[30]  David Lo,et al.  Automated construction of a software-specific word similarity database , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[31]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[32]  Collin McMillan,et al.  Portfolio: Searching for relevant functions and their usages in millions of lines of code , 2013, TSEM.

[33]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[34]  Sarun Intakosum,et al.  Case-Based Reasoning for Design Patterns Searching System , 2013 .

[35]  David Lo,et al.  Tag recommendation in software information sites , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[36]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[37]  Cheng Zhang,et al.  What Do We Know about the Effectiveness of Software Design Patterns? , 2012, IEEE Transactions on Software Engineering.

[38]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[39]  Yann-Gaël Guéhéneuc,et al.  Recommendation system for design patterns in software development: An DPR overview , 2012, 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE).

[40]  Saeed Jalili,et al.  Design patterns selection: An automatic two-phase method , 2012, J. Syst. Softw..

[41]  Giuliano Antoniol,et al.  Can Better Identifier Splitting Techniques Help Feature Location? , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[42]  Ahmed Tamrawi,et al.  Fuzzy set approach for automatic tagging in evolving software , 2010, 2010 IEEE International Conference on Software Maintenance.

[43]  S. Sawilowsky New Effect Size Rules of Thumb , 2009 .

[44]  J. Goodier The Concise Encyclopedia of Statistics , 2009 .

[45]  Dae-Kyoo Kim,et al.  An approach to precisely specifying the problem domain of design patterns , 2007, J. Vis. Lang. Comput..

[46]  Scott Henninger,et al.  Software pattern communities: current practices and challenges , 2007, PLOP '07.

[47]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[48]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[49]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[50]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[52]  Martin Fowler,et al.  Patterns of Enterprise Application Architecture , 2002 .

[53]  Bruce Powell Douglass,et al.  Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems , 2002 .

[54]  Nuno Seco,et al.  Using CBR for Automation of Software Design Patterns , 2002, ECCBR.

[55]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[56]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[57]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[58]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[59]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[60]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[61]  Max Jacobson,et al.  A Pattern Language: Towns, Buildings, Construction , 1981 .

[62]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[63]  D. Bren,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[64]  Alper Kursat Uysal,et al.  An improved global feature selection scheme for text classification , 2016, Expert Syst. Appl..

[65]  Nadia Bouassida,et al.  An Interactive Design Pattern Selection Method , 2015, J. Univers. Comput. Sci..

[66]  Radu Soricut,et al.  Unsupervised Morphology Induction Using Word Embeddings , 2015, NAACL.

[67]  Vili Podgorelec,et al.  A question-based design pattern advisement approach , 2014, Comput. Sci. Inf. Syst..

[68]  Peter Sommerlad,et al.  Security Patterns: Integrating Security and Systems Engineering , 2006 .

[69]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[70]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .