Generating Concept based API Element Comparison Using a Knowledge Graph

Developers are concerned with the comparison of similar APIs in terms of their commonalities and (often subtle) differences. Our empirical study of Stack Overflow questions and API documentation confirms that API comparison questions are common and can often be answered by knowledge contained in API reference documentation. Our study also identifies eight types of API statements that are useful for API comparison. Based on these findings, we propose a knowledge graph based approach APIComp that automatically extracts API knowledge from API reference documentation to support the comparison of a pair of API classes or methods from different aspects. Our approach includes an offline phase for constructing an API knowledge graph, and an online phase for generating an API comparison result for a given pair of API elements. Our evaluation shows that the quality of different kinds of extracted knowledge in the API knowledge graph is generally high. Furthermore, the comparison results generated by APIComp are significantly better than those generated by a baseline approach based on heuristic rules and text similarity, and our generated API comparison results are useful for helping developers in API selection tasks.

[1]  Jiawei Han,et al.  Comparative Document Analysis for Large Text Corpora , 2015, WSDM.

[2]  Tao Xie,et al.  Inferring method specifications from natural language API descriptions , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[3]  Xin Peng,et al.  A learning-based approach for automatic construction of domain glossary from source code and documentation , 2019, ESEC/SIGSOFT FSE.

[4]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[5]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[6]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[7]  Tao Xie,et al.  An Empirical Study on Evolution of API Documentation , 2011, FASE.

[8]  Denny Vrandecic The Rise of Wikidata , 2013, IEEE Intelligent Systems.

[9]  Xiaojun Wan,et al.  Comparative News Summarization Using Linear Programming , 2011, ACL.

[10]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[11]  Martin P. Robillard,et al.  Recovering traceability links between an API and its learning resources , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[12]  Tao Xie,et al.  Inferring specifications for resources from natural language API documentation , 2011, Automated Software Engineering.

[13]  Reid Holmes,et al.  Live API documentation , 2014, ICSE.

[14]  Yu Zhou,et al.  Analyzing APIs Documentation and Code to Detect Directive Defects , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[15]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[16]  Christoph Treude,et al.  9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[17]  Michael Eichberg,et al.  What should developers be aware of? An empirical study on the directives of API documentation , 2011, Empirical Software Engineering.

[18]  Christoph Treude,et al.  Augmenting API Documentation with Insights from Stack Overflow , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[19]  Lori L. Pollock,et al.  JSummarizer: An automatic generator of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[20]  Jiamou Sun,et al.  Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[21]  Tao Zhang,et al.  An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Xin Peng,et al.  Know-How in Programming Tasks: From Textual Tutorials to Task-Oriented Knowledge Graph , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[24]  Hady W. Lauw,et al.  CompareLDA: A Topic Model for Document Comparison , 2019, AAAI.

[25]  Xin Peng,et al.  Generating query-specific class API summaries , 2019, ESEC/SIGSOFT FSE.

[26]  Martin P. Robillard,et al.  Patterns of Knowledge in API Reference Documentation , 2013, IEEE Transactions on Software Engineering.

[27]  Davide Fucci,et al.  On using machine learning to identify knowledge in API reference documentation , 2019, ESEC/SIGSOFT FSE.

[28]  Monika Eisenhower,et al.  Elements Of Survey Sampling , 2016 .