Exploring Code Style Transfer with Neural Networks

Style is a significant component of natural language text, reflecting a change in the tone of text while keeping the underlying information the same. Even though programming languages have strict syntax rules, they also have style. Code can be written with the same functionality but using different language features. However, programming style is difficult to quantify, and thus as part of this work, we define style attributes, specifically for Python. To build a definition of style, we utilized hierarchical clustering to capture a style definition without needing to specify transformations. In addition to defining style, we explore the capability of a pre-trained code language model to capture information about code style. To do this, we fine-tuned pre-trained code-language models and evaluated their performance in code style transfer tasks.

[1]  Gabriel Synnaeve,et al.  Code Translation with Compiler Representations , 2022, ICLR.

[2]  Chris Callison-Burch,et al.  A Recipe for Arbitrary Text Style Transfer with Large Language Models , 2021, ACL.

[3]  Di Jin,et al.  Deep Learning for Text Style Transfer: A Survey , 2020, CL.

[4]  Yue Wang,et al.  CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , 2021, EMNLP.

[5]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[6]  Kai-Wei Chang,et al.  Unified Pre-training for Program Understanding and Generation , 2021, NAACL.

[7]  Yu Zhang,et al.  An Empirical Study for Common Language Features Used in Python Projects , 2021, 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

[8]  David C. Uthus,et al.  TextSETTR: Label-Free Text Style Extraction and Tunable Targeted Restyling , 2021, ArXiv.

[9]  Ming Zhou,et al.  CodeBLEU: a Method for Automatic Evaluation of Code Synthesis , 2020, ArXiv.

[10]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[11]  Aditya Kanade,et al.  Learning and Evaluating Contextual Embedding of Source Code , 2019, ICML.

[12]  M. Baum,et al.  A Hybrid Approach To Hierarchical Density-based Cluster Selection , 2019, 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[13]  Ivan P. Yamshchikov,et al.  Style Transfer for Texts: Retrain, Report Errors, Compare with Rewrites , 2019, EMNLP.

[14]  Martin T. Vechev,et al.  Probabilistic model for code with decision trees , 2016, OOPSLA.

[15]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[16]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.