论文信息 - A Survey of Machine Learning for Big Code and Naturalness

A Survey of Machine Learning for Big Code and Naturalness

Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.

[1] Marc Brockschmidt,et al. Neural Functional Programming , 2016, ICLR.

[2] Andrew D. Gordon,et al. Bimodal Modelling of Source Code and Natural Language , 2015, ICML.

[3] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[4] Andreas Krause,et al. Learning programs from noisy data , 2016, POPL.

[5] Sebastian Nowozin,et al. DeepCoder: Learning to Write Programs , 2016, ICLR.

[6] Rui Abreu,et al. A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[7] Martin T. Vechev,et al. PHOG: Probabilistic Model for Code , 2016, ICML.

[8] Swarat Chaudhuri,et al. Bayesian specification learning for finding API usage errors , 2017, ESEC/SIGSOFT FSE.

[9] Sumit Gulwani,et al. Program Synthesis , 2017, Software Systems Safety.

[10] Daniel Tarlow,et al. Structured Generative Models of Natural Source Code , 2014, ICML.

[11] Michael D. Ernst,et al. NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System , 2018, LREC.

[12] Lihong Li,et al. Neuro-Symbolic Program Synthesis , 2016, ICLR.

[13] Chadd C. Williams,et al. Automatic mining of source code repositories to improve bug finding techniques , 2005, IEEE Transactions on Software Engineering.

[14] Thomas A. Henzinger,et al. Probabilistic programming , 2014, FOSE.

[15] Mira Mezini,et al. Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[16] Dacheng Tao,et al. A Survey on Multi-view Learning , 2013, ArXiv.

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Daniel Kroening,et al. Behavioral consistency of C and Verilog programs using bounded model checking , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[19] Phil Blunsom,et al. Inducing Tree-Substitution Grammars , 2010, J. Mach. Learn. Res..

[20] Song Wang,et al. Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[21] Michael S. Bernstein,et al. Emergent, crowd-scale programming practice in the IDE , 2014, CHI.

[22] Andreas Zeller,et al. Mining Version Histories to Guide Software Changes , 2004 .

[23] Alvin Cheung,et al. Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[24] Cyrus Omar. Structured statistical syntax tree prediction , 2013, SPLASH '13.

[25] Jurgen J. Vinju,et al. Towards a universal code formatter through machine learning , 2016, SLE.

[26] Rainer Koschke,et al. Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[27] Pushmeet Kohli,et al. RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[28] Regina Barzilay,et al. Using Semantic Unification to Generate Regular Expressions from Natural Language , 2013, NAACL.

[29] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[30] Kim Mens,et al. Source Code-Based Recommendation Systems , 2014, Recommendation Systems in Software Engineering.

[31] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[32] Rishabh Singh,et al. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks , 2016, ArXiv.

[33] Mira Mezini,et al. Evaluating the evaluations of code recommender systems: A reality check , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[34] Collin McMillan,et al. Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[35] Abram Hindle,et al. Using machine translation for converting Python 2 to Python 3 code , 2015, PeerJ Prepr..

[36] Yutaka Matsuo,et al. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes , 2017, ACL.

[37] Mirella Lapata,et al. Autofolding for Source Code Summarization , 2014, IEEE Transactions on Software Engineering.

[38] Neel Kant,et al. Recent Advances in Neural Program Synthesis , 2018, ArXiv.

[39] Xin Zhang,et al. A user-guided approach to program analysis , 2015, ESEC/SIGSOFT FSE.

[40] Tao Wang,et al. Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[41] Michael I. Jordan,et al. Scalable statistical bug isolation , 2005, PLDI '05.

[42] Nando de Freitas,et al. Neural Programmer-Interpreters , 2015, ICLR.

[43] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[44] Andreas Krause,et al. Predicting Program Properties from "Big Code" , 2015, POPL.

[45] Anh Tuan Nguyen,et al. Graph-Based Statistical Language Model for Code , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[46] Alexander M. Rush,et al. Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[47] Matt Post,et al. Bayesian Learning of a Tree Substitution Grammar , 2009, ACL.

[48] Swarat Chaudhuri,et al. Neural Sketch Learning for Conditional Program Generation , 2017, ICLR.

[49] Westley Weimer,et al. Synthesizing API usage examples , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[50] Koushik Sen,et al. Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts , 2017, ArXiv.

[51] Andreas Zeller,et al. Detecting object usage anomalies , 2007, ESEC-FSE '07.

[52] Charles A. Sutton,et al. Mining idioms from source code , 2014, SIGSOFT FSE.

[53] Tim Rocktäschel,et al. Programming with a Differentiable Forth Interpreter , 2016, ICML.

[54] Lior Wolf,et al. Learning to Align the Source Code to the Compiled Object Code , 2017, ICML.

[55] Josef Urban,et al. DeepMath - Deep Sequence Models for Premise Selection , 2016, NIPS.

[56] Konrad Rieck,et al. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[57] Jian Pei,et al. MAPO: mining API usages from open source repositories , 2006, MSR '06.

[58] Swarat Chaudhuri,et al. Bayesian Sketch Learning for Program Synthesis , 2017, ArXiv.

[59] Devin Chollak,et al. Bugram: Bug detection with n-gram language models , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[60] Dawson R. Engler,et al. A Factor Graph Model for Software Bug Finding , 2007, IJCAI.

[61] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[62] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[63] Anh Tuan Nguyen,et al. A statistical semantic language model for source code , 2013, ESEC/FSE 2013.

[64] Richard Socher,et al. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[65] Yuriy Brun,et al. The plastic surgery hypothesis , 2014, SIGSOFT FSE.

[66] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[67] Anh Tuan Nguyen,et al. Lexical statistical machine translation for language migration , 2013, ESEC/FSE 2013.

[68] Rico Sennrich,et al. A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation , 2017, IJCNLP.

[69] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[70] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[71] Dawson R. Engler,et al. A few billion lines of code later , 2010, Commun. ACM.

[72] Premkumar T. Devanbu,et al. Will They Like This? Evaluating Code Contributions with Language Models , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[73] Tomoki Toda,et al. Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[74] Eran Yahav,et al. Code completion with statistical language models , 2014, PLDI.

[75] Tim Rocktäschel,et al. End-to-end Differentiable Proving , 2017, NIPS.

[76] Hongseok Yang,et al. Learning a strategy for adapting a program analysis via bayesian optimisation , 2015, OOPSLA.

[77] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[78] Frank Keller,et al. Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings , 2016, NAACL.

[79] W. Bruce Croft. Evolutionary Linguistics , 2008 .

[80] David Lo,et al. RCLinker: Automated Linking of Issue Reports and Commits Leveraging Rich Contextual Information , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[81] Benjamin Livshits,et al. Merlin: specification inference for explicit information flow problems , 2009, PLDI '09.

[82] Christian Bird,et al. Products, developers, and milestones: how should I build my N-Gram language model , 2015, ESEC/SIGSOFT FSE.

[83] Earl T. Barr,et al. Learning Python Code Suggestion with a Sparse Pointer Network , 2016, ArXiv.

[84] William W. Cohen,et al. KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts , 2015, ACL.

[85] Leonidas J. Guibas,et al. Learning Program Embeddings to Propagate Feedback on Student Code , 2015, ICML.

[86] 吴树峰. 从学徒到大师之路--读《 The Pragmatic Programmer, From Journeyman to Master》 , 2007 .

[87] Sumit Gulwani,et al. NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation , 2014, SIGMOD Conference.

[88] Eran Yahav,et al. Extracting code from programming tutorial videos , 2016, Onward!.

[89] Philip J. Guo,et al. OverCode: visualizing variation in student solutions to programming problems at scale , 2014, ACM Trans. Comput. Hum. Interact..

[90] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[91] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[92] Michael I. Jordan,et al. Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML.

[93] Cezary Kaliszyk,et al. Deep Network Guided Proof Search , 2017, LPAR.

[94] Martin P. Robillard,et al. Recommendation Systems for Software Engineering , 2010, IEEE Software.

[95] Koushik Sen,et al. Deep Learning to Find Bugs , 2017 .

[96] Premkumar T. Devanbu,et al. On the "naturalness" of buggy code , 2015, ICSE.

[97] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[98] Collin McMillan,et al. Portfolio: Searching for relevant functions and their usages in millions of lines of code , 2013, TSEM.

[99] Donald E. Knuth,et al. Literate Programming , 1984, Comput. J..

[100] Dawn Song,et al. Neural Code Completion , 2017 .

[101] Premkumar T. Devanbu,et al. Recovering clear, natural identifiers from obfuscated JS names , 2017, ESEC/SIGSOFT FSE.

[102] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[103] Swarat Chaudhuri,et al. Finding Likely Errors with Bayesian Specifications , 2017, ArXiv.

[104] Viktor Kuncak,et al. Synthesizing Java expressions from free-form queries , 2015, OOPSLA.

[105] David Lo,et al. NIRMAL: Automatic identification of software relevant tweets leveraging language model , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[106] Trong Duc Nguyen,et al. Exploring API Embedding for API Usages and Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[107] Premkumar T. Devanbu,et al. On the localness of software , 2014, SIGSOFT FSE.

[108] Zhi Jin,et al. CodeSum: Translate Program Language to Natural Language , 2017, ArXiv.

[109] Chris Piech,et al. Deep Knowledge Tracing On Programming Exercises , 2017, L@S.

[110] Trong Duc Nguyen,et al. Mapping API Elements for Code Migration with Vector Representations , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[111] DevanbuPremkumar,et al. A Survey of Machine Learning for Big Code and Naturalness , 2018 .

[112] Raymond J. Mooney,et al. Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes , 2015, ACL.

[113] Ying Zou,et al. Learning to rank code examples for code search engines , 2017, Empirical Software Engineering.

[114] Lauretta O. Osho,et al. Axiomatic Basis for Computer Programming , 2013 .

[115] Robert J. Walker,et al. Strathcona example recommendation tool , 2005, ESEC/FSE-13.

[116] Michael I. Jordan,et al. Statistical Debugging of Sampled Programs , 2003, NIPS.

[117] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[118] Westley Weimer,et al. Decoding the Representation of Code in the Brain: An fMRI Study of Code Review and Expertise , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[119] Martin T. Vechev,et al. Programming with "Big Code": Lessons, Techniques and Applications , 2015, SNAPL.

[120] Tung Thanh Nguyen,et al. Learning API Usages from Bytecode: A Statistical Approach , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[121] Mira Mezini,et al. Intelligent Code Completion with Bayesian Networks , 2015, ACM Trans. Softw. Eng. Methodol..

[122] Charles A. Sutton,et al. A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[123] Pedro M. Domingos,et al. Programming by demonstration: a machine learning approach , 2001 .

[124] Zhendong Su,et al. A study of the uniqueness of source code , 2010, FSE '10.

[125] Claire Le Goues,et al. Toward Semantic Foundations for Program Editors , 2017, SNAPL.

[126] P. J. Brown. Software portability : an advanced course , 1979 .

[127] Han Liu,et al. Towards Better Program Obfuscation: Optimization via Language Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[128] Truyen Tran,et al. A deep language model for software code , 2016, FSE 2016.

[129] Sumit Gulwani,et al. Predicting a Correct Program in Programming by Example , 2015, CAV.

[130] Marc Brockschmidt,et al. Learning to Represent Programs with Graphs , 2017, ICLR.

[131] David M. Blei,et al. Probabilistic topic models , 2012, Commun. ACM.

[132] Premkumar T. Devanbu,et al. CACHECA: A Cache Language Model Based Code Suggestion Tool , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[133] Premkumar T. Devanbu. New Initiative: The Naturalness of Software , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[134] Charles A. Sutton,et al. Parameter-free probabilistic API mining across GitHub , 2015, SIGSOFT FSE.

[135] Eran Yahav,et al. Programming with "Big Code" , 2015, Found. Trends Program. Lang..

[136] Adam A. Porter,et al. Learning a classifier for false positive error reports emitted by static code analysis tools , 2017, MAPL@PLDI.

[137] Dawson R. Engler,et al. Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[138] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[139] Michael Pradel,et al. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data , 2016 .

[140] Swarat Chaudhuri,et al. Data-Driven Program Completion , 2017, ArXiv.

[141] Richard S. Zemel,et al. Gated Graph Sequence Neural Networks , 2015, ICLR.

[142] Petar Tsankov,et al. Statistical Deobfuscation of Android Applications , 2016, CCS.

[143] Charles A. Sutton,et al. Parameter-Free Probabilistic API Mining at GitHub Scale , 2015, ArXiv.

[144] Christopher C. Cummins,et al. Synthesizing benchmarks for predictive modeling , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[145] Xiaodong Gu,et al. Deep API learning , 2016, SIGSOFT FSE.

[146] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[147] Tim Menzies,et al. Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[148] Nicholas A. Kraft,et al. Exploring the use of deep learning for feature location , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[149] Armando Solar-Lezama,et al. sk_p: a neural program corrector for MOOCs , 2016, SPLASH.

[150] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[151] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[152] David A. Wagner,et al. Verifiable functional purity in java , 2008, CCS.

[153] William W. Cohen,et al. Natural Language Models for Predicting Programming Comments , 2013, ACL.

[154] Satish Narayanasamy,et al. Using web corpus statistics for program analysis , 2014, OOPSLA.

[155] Hongseok Yang,et al. Automatically generating features for learning program analysis heuristics for C-like languages , 2017, Proc. ACM Program. Lang..

[156] Premkumar T. Devanbu,et al. Are deep neural networks the best choice for modeling source code? , 2017, ESEC/SIGSOFT FSE.

[157] Marc Brockschmidt,et al. SmartPaste: Learning to Adapt Source Code , 2017, ArXiv.

[158] Luo Si,et al. A Probabilistic Discriminative Model for Android Malware Detection with Decompiled Source Code , 2015, IEEE Transactions on Dependable and Secure Computing.

[159] Pushmeet Kohli,et al. Learning Continuous Semantic Representations of Symbolic Expressions , 2016, ICML.

[160] Butler W. Lampson,et al. A Machine Learning Framework for Programming by Example , 2013, ICML.

[161] Daniel D. Johnson,et al. Learning Graphical State Transitions , 2016, ICLR.

[162] Tao Xie,et al. Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[163] Michael D. Ernst,et al. Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[164] Armando Solar-Lezama,et al. Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[165] Patrick Cousot,et al. The ASTREÉ Analyzer , 2005, ESOP.

[166] Swarat Chaudhuri,et al. Neural Attribute Machines for Program Generation , 2017, ArXiv.

[167] Martin T. Vechev,et al. Phrase-Based Statistical Translation of Programming Languages , 2014, Onward!.

[168] Jane Cleland-Huang,et al. Semantically Enhanced Software Traceability Using Deep Learning Techniques , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[169] Dong Qiu,et al. A Study of "Wheat" and "Chaff" in Source Code , 2015, ArXiv.

[170] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[171] Tony Beltramelli,et al. pix2code: Generating Code from a Graphical User Interface Screenshot , 2017, EICS.

[172] Zhendong Su,et al. Javert: fully automatic mining of general temporal properties from dynamic traces , 2008, SIGSOFT '08/FSE-16.

[173] Anh Tuan Nguyen,et al. Divide-and-Conquer Approach for Multi-phase Statistical Migration for Source Code (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[174] Markus Pizka,et al. Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[175] Premkumar T. Devanbu,et al. On the naturalness of software , 2016, Commun. ACM.

[176] Kathryn T. Stolee,et al. How developers search for code: a case study , 2015, ESEC/SIGSOFT FSE.

[177] Yuxin Chen,et al. Learning Shape Analysis , 2017, SAS.

[178] Jurgen J. Vinju,et al. Technical Report: Towards a Universal Code Formatter through Machine Learning , 2016, ArXiv.

[179] Jian Pei,et al. Mining API patterns as partial orders from source code: from usage scenarios to specifications , 2007, ESEC-FSE '07.

[180] Gerardo Canfora,et al. Irish: A Hidden Markov Model to detect coded information islands in free text , 2015, Sci. Comput. Program..

[181] Giuliano Antoniol,et al. Traceability Fundamentals , 2012, Software and Systems Traceability.

[182] Chris Cummins,et al. End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[183] Yoshua Bengio,et al. Learning to Understand Phrases by Embedding the Dictionary , 2015, TACL.

[184] Michael I. Jordan,et al. Learning Programs: A Hierarchical Bayesian Approach , 2010, ICML.

[185] Quoc V. Le,et al. Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[186] Wang Ling,et al. Latent Predictor Networks for Code Generation , 2016, ACL.

[187] Xi Victoria Lin. Program Synthesis from Natural Language Using Recurrent Neural Networks , 2017 .

[188] Martin White,et al. Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[189] Charles A. Sutton,et al. Learning natural coding conventions , 2014, SIGSOFT FSE.

[190] Mira Mezini,et al. A Study of Visual Studio Usage in Practice , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[191] José Nelson Amaral,et al. Syntax errors just aren't natural: improving error reporting with language models , 2014, MSR 2014.

[192] Christoph Treude,et al. Augmenting API Documentation with Insights from Stack Overflow , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[193] Rahul Gupta,et al. DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[194] Rosalva E. Gallardo-Valencia,et al. Internet-Scale Code Search , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[195] Charles A. Sutton,et al. Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[196] David Thomas,et al. The Pragmatic Programmer: From Journeyman to Master , 1999 .

[197] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[198] Martin White,et al. Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[199] Dan Klein,et al. Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[200] Charles A. Sutton,et al. Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[201] Rahul Gupta,et al. Deep Reinforcement Learning for Programming Language Correction , 2018, ArXiv.

[202] Pushmeet Kohli,et al. TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[203] Hridesh Rajan,et al. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[204] Premkumar T. Devanbu,et al. Mining Semantic Loop Idioms from Big Code , 2016 .

[205] Michael D. Ernst. Natural Language is a Programming Language: Applying Natural Language Processing to Software Development , 2017, SNAPL.