DeepIaC: deep learning-based linguistic anti-pattern detection in IaC

Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their names. To this end, we propose a novel automated approach that employs word embeddings and deep learning techniques. We build and use the abstract syntax tree of IaC code units to create their code embedments. Our experiments with a dataset systematically extracted from open source repositories show that our approach yields an accuracy between 0.785 and 0.915 in detecting inconsistencies.

[1]  Kief Morris,et al.  Infrastructure as Code: Managing Servers in the Cloud , 2016 .

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  David S. Moore,et al.  Statistics in Practice , 2014 .

[4]  Diomidis Spinellis,et al.  Does Your Configuration Code Smell? , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[5]  Yann-Gaël Guéhéneuc,et al.  A New Family of Software Anti-patterns: Linguistic Anti-patterns , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[6]  Laurie A. Williams,et al.  Source Code Properties of Defective Infrastructure as Code Scripts , 2018, Inf. Softw. Technol..

[7]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[8]  Martin Garriga,et al.  Adoption, Support, and Challenges of Infrastructure-as-Code: Insights from Industry , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[9]  Qasim Umer,et al.  Deep Learning Based Identification of Suspicious Return Statements , 2020, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[10]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Laurie A. Williams,et al.  Characterizing Defective Configuration Scripts Used for Continuous Deployment , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[13]  Yves Le Traon,et al.  Learning to Spot and Refactor Inconsistent Method Names , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[14]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[15]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[16]  Horst Lichter,et al.  Code Smells in Infrastructure as Code , 2018, 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC).

[17]  Chris Parnin,et al.  The Seven Sins: Security Smells in Infrastructure as Code Scripts , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[18]  Masakazu Matsugu,et al.  Subject independent facial expression recognition with robust face detection using a convolutional neural network , 2003, Neural Networks.

[19]  Davide Spadini,et al.  PyDriller: Python framework for mining software repositories , 2018, ESEC/SIGSOFT FSE.

[20]  D. S. Moore,et al.  The Basic Practice of Statistics , 2001 .

[21]  Peng Wang,et al.  Semantic Clustering and Convolutional Neural Network for Short Text Categorization , 2015, ACL.

[22]  Foutse Khomh,et al.  Keep it simple: Is deep learning good for linguistic smell detection? , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[23]  Georgios Meditskos,et al.  Towards Semantic Detection of Smells in Cloud Infrastructure Code , 2020, WIMS.

[24]  Koushik Sen,et al.  DeepBugs: a learning approach to name-based bug detection , 2018, Proc. ACM Program. Lang..

[25]  Laurie A. Williams,et al.  Where Are The Gaps? A Systematic Mapping Study of Infrastructure as Code Research , 2018, Inf. Softw. Technol..