Leveraging Unsupervised Learning to Summarize APIs Discussed in Stack Overflow

Automated source code summarization is a task that generates summarized information about the purpose, usage, and–or implementation of methods and classes to support understanding of these code entities. Multiple approaches and techniques have been proposed for supervised and unsupervised learning in code summarization, however, they were mostly focused on generating a summary for a piece of code. In addition, very few works have leveraged unofficial documentation.This paper proposes an automatic and novel approach for summarizing Android API methods discussed in Stack Overflow that we consider as unofficial documentation in this research. Our approach takes the API method’s name as an input and generates a natural language summary based on Stack Overflow discussions of that API method. We have conducted a survey that involves 16 Android developers to evaluate the quality of our automatically generated summaries and compare them with the official Android documentation.Our results demonstrate that while developers find the official documentation more useful in general, the generated summaries are also competitive, in particular for offering implementation details, and can be used as a complementary source for guiding developers in software development and maintenance tasks.

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  Philip S. Yu,et al.  Improving Automatic Source Code Summarization via Deep Reinforcement Learning , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[4]  Collin McMillan,et al.  Automatic Source Code Summarization of Context for Java Methods , 2016, IEEE Transactions on Software Engineering.

[5]  Minh Le Nguyen,et al.  Automatically classifying source code using tree-based approaches , 2017, Data Knowl. Eng..

[6]  Liping Han,et al.  Distance Weighted Cosine Similarity Measure for Text Classification , 2013, IDEAL.

[7]  Oguzhan Tas,et al.  A SURVEY AUTOMATIC TEXT SUMMARIZATION , 2017 .

[8]  Collin McMillan,et al.  A Neural Model for Generating Natural Language Summaries of Program Subroutines , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[9]  Baishakhi Ray,et al.  A Transformer-based Approach for Source Code Summarization , 2020, ACL.

[10]  Foutse Khomh,et al.  Automatic summarization of API reviews , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[11]  Reid Holmes,et al.  Live API documentation , 2014, ICSE.

[12]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[13]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[14]  Gabriele Bavota,et al.  How Can I Use This Method? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[15]  Martin P. Robillard,et al.  Recovering traceability links between an API and its learning resources , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[16]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[17]  Gail C. Murphy,et al.  Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[20]  Romain Robbes,et al.  Linking e-mails and source code artifacts , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[21]  Diomidis Spinellis,et al.  Word Embeddings for the Software Engineering Domain , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[22]  Martin P. Robillard,et al.  How API Documentation Fails , 2015, IEEE Software.

[23]  Magnus C. Ohlsson,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[24]  Jonathan I. Maletic,et al.  Developer Reading Behavior While Summarizing Java Methods: Size and Context Matters , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[25]  Martin P. Robillard,et al.  What Makes APIs Hard to Learn? Answers from Developers , 2009, IEEE Software.

[26]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[27]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[28]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[29]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[30]  Collin McMillan,et al.  An Eye-Tracking Study of Java Programmers and Application to Source Code Summarization , 2015, IEEE Transactions on Software Engineering.

[31]  Andrian Marcus,et al.  JStereoCode: automatically identifying method and class stereotypes in Java code , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[32]  Yuxiang Zhu,et al.  Automatic Code Summarization: A Systematic Literature Review , 2019, ArXiv.

[33]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[34]  Peter C. Rigby,et al.  Leveraging Informal Documentation to Summarize Classes and Methods in Context , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[35]  Olga Baysal,et al.  Studying Developer Reading Behavior on Stack Overflow during API Summarization Tasks , 2020, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[36]  Foutse Khomh,et al.  Mining API usage scenarios from stack overflow , 2020, Inf. Softw. Technol..

[37]  BabarMuhammad Ali,et al.  Deep Learning for Source Code Modeling and Generation , 2020 .