Feature-Based Software Design Pattern Detection

Software design patterns are standard solutions to common problems in software design and architecture. Knowing that a particular module implements a design pattern is a shortcut to design comprehension. Manually detecting design patterns is a time consuming and challenging task; therefore, researchers have proposed automatic design patterns detection techniques to facilitate software developers. However, these techniques show low performance for certain design patterns. In this work, we introduce an approach that improves the performance over the state-of-the-art by using code features with machine learning classifiers to automatically train a design pattern detection. We create a semantic representation of source code from the code features and the call graph, and apply the Word2Vec algorithm on the semantic representation to construct the word-space geometric model of the Java source code. DPD_F then uses a Machine Learning approach trained using the word-space model and identifies software design patterns with 78% Precision and 76% Recall. Additionally, we have compared our results with two existing design pattern detection approaches namely FeatureMaps & MARPLE-DPD. Empirical results demonstrate that our approach outperforms the benchmark approaches by 30\% and 10\% respectively in terms of Precision. The runtime performance also supports its practical applicability.

[1]  Lori L. Pollock,et al.  JSummarizer: An automatic generator of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[2]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[3]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[4]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[5]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[6]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[7]  Andrian Marcus,et al.  JStereoCode: automatically identifying method and class stereotypes in Java code , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[8]  Wiwat Vatanawood,et al.  Detection of design patterns from class diagram and sequence diagrams using ontology , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).

[9]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[10]  Abbas Rasoolzadegan Barforoush,et al.  Design pattern detection based on the graph theory , 2017, Knowl. Based Syst..

[11]  Mohamed Wiem Mkaouer,et al.  Towards Prioritizing Documentation Effort , 2018, IEEE Transactions on Software Engineering.

[12]  Bixin Li,et al.  Accurate Design Pattern Detection Based on Idiomatic Implementation Matching in Java Language Context , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[13]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[14]  Jim Welsh,et al.  Towards pattern-based design recovery , 2002, ICSE '02.

[15]  Philippe Preux,et al.  A large-scale study of call graph-based impact prediction using mutation testing , 2016, Software Quality Journal.

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  Robert Ivor John,et al.  Towards machine learning based design pattern recognition , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).

[18]  David Lo,et al.  Deep code comment generation with hybrid lexical and syntactical information , 2019, Empirical Software Engineering.

[19]  Yongtao Sun,et al.  Compound record clustering algorithm for design pattern detection by decision tree learning , 2008, 2008 IEEE International Conference on Information Reuse and Integration.

[20]  Partha Kuchana Software Architecture Design Patterns in Java , 2004 .

[21]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .

[22]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[23]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[24]  Emmanouel A. Giakoumakis,et al.  Automated refactoring to the Strategy design pattern , 2012, Inf. Softw. Technol..

[25]  Claudia Raibulet,et al.  Understanding the relevance of micro-structures for design patterns detection , 2011, J. Syst. Softw..

[26]  José Creissac Campos,et al.  A Patterns Based Reverse Engineering Approach for Java Source Code , 2012, 2012 35th Annual IEEE Software Engineering Workshop.

[27]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[28]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[29]  Yixin Chen,et al.  An Empirical Study of the Textual Content of Online Videos , 2016, Int. J. Semantic Comput..

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Marco Zanoni,et al.  A Design Pattern Detection Plugin for Eclipse , 2009 .

[32]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[33]  Gongjun Yan,et al.  Rule-based detection of design patterns in program code , 2013, International Journal on Software Tools for Technology Transfer.

[34]  Giuseppe Scanniello,et al.  Documenting Design-Pattern Instances: A Family of Experiments on Source-Code Comprehensibility , 2015, TSEM.

[35]  Hironori Washizaki,et al.  Design pattern detection using software metrics and machine learning , 2011 .

[36]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[37]  Ronald A. Olsson,et al.  Reverse Engineering of Design Patterns from Java Source Code , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[38]  N. A. Diamantidis,et al.  Automated refactoring to the Null Object design pattern , 2015, Inf. Softw. Technol..

[39]  Alexander Chatzigeorgiou,et al.  Design Pattern Detection Using Similarity Scoring , 2006, IEEE Transactions on Software Engineering.

[40]  Zhang Haotian,et al.  Java Source Code Static Check Eclipse Plug-In Based on Common Design Pattern , 2013, 2013 Fourth World Congress on Software Engineering.

[41]  Saeed Jalili,et al.  Source code and design conformance, design pattern detection from source code by classification approach , 2015, Appl. Soft Comput..

[42]  Anca Dinu,et al.  Alternative measures of word relatedness in distributional semantics , 2013, JSSP.

[43]  Juha Hautamäki,et al.  Pattern-based tool support for frameworks towards architecture-oriented software development environment , 2005 .

[44]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[45]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[46]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[47]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[48]  Alexander Egyed,et al.  Feature Maps: A Comprehensible Software Representation for Design Pattern Detection , 2018, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[49]  Najam Nazar,et al.  CodeLabeller: A Web-based Code Annotation Tool for Java Design Patterns and Summaries , 2021, ArXiv.

[50]  Rudolf Ferenc,et al.  Design pattern mining enhanced by machine learning , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[51]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[52]  Michele Lanza,et al.  Object-Oriented Metrics in Practice - Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems , 2006 .

[53]  Collin McMillan,et al.  Automatic Source Code Summarization of Context for Java Methods , 2016, IEEE Transactions on Software Engineering.

[54]  Matthias Meyer,et al.  Reverse engineering with the reclipse tool suite , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[55]  Tao Zhang,et al.  Source code fragment summarization with small-scale crowdsourcing based features , 2015, Frontiers of Computer Science.

[56]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[57]  Bartosz Walter,et al.  The relationship between design patterns and code smells: An exploratory study , 2016, Inf. Softw. Technol..

[58]  Ping Zhang,et al.  Efficiently detecting structural design pattern instances based on ordered sequences , 2018, J. Syst. Softw..

[59]  Awais Ahmad,et al.  Implications of deep learning for the automation of design patterns organization , 2017, J. Parallel Distributed Comput..

[60]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[61]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[62]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[63]  Fabio Stella,et al.  On applying machine learning techniques for design pattern detection , 2015, J. Syst. Softw..

[64]  Kyle G. Brown,et al.  Design reverse-engineering and automated design-pattern detection in Smalltalk , 1996 .

[65]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[66]  Mario Luca Bernardi,et al.  A model-driven graph-matching approach for design pattern detection , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[67]  Wiwat Vatanawood,et al.  Detection of design pattern in class diagram using ontology , 2014, 2014 International Computer Science and Engineering Conference (ICSEC).

[68]  Michael Philippsen,et al.  Two Controlled Experiments Assessing the Usefulness of Design Pattern Documentation in Program Maintenance , 2002, IEEE Trans. Software Eng..

[69]  Collin McMillan,et al.  An empirical study of the textual similarity between source code and source code summaries , 2016, Empirical Software Engineering.

[70]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..