Improving reusability of software libraries through usage pattern mining

Abstract Modern software systems are increasingly dependent on third-party libraries. It is widely recognized that using mature and well-tested third-party libraries can improve developers’ productivity, reduce time-to-market, and produce more reliable software. Today’s open-source repositories provide a wide range of libraries that can be freely downloaded and used. However, as software libraries are documented separately but intended to be used together, developers are unlikely to fully take advantage of these reuse opportunities. In this paper, we present a novel approach to automatically identify third-party library usage patterns, i.e., collections of libraries that are commonly used together by developers. Our approach employs a hierarchical clustering technique to group together software libraries based on external client usage. To evaluate our approach, we mined a large set of over 6000 popular libraries from Maven Central Repository and investigated their usage by over 38,000 client systems from the Github repository. Our experiments show that our technique is able to detect the majority (77%) of highly consistent and cohesive library usage patterns across a considerable number of client systems.

[1]  Lin Li,et al.  Obstacles in Using Frameworks and APIs: An Exploratory Study of Programmers' Newsgroup Discussions , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[2]  Martin P. Robillard,et al.  What Makes APIs Hard to Learn? Answers from Developers , 2009, IEEE Software.

[3]  Ralf Lämmel,et al.  Multi-dimensional exploration of API usage , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[4]  Stefan Hanenberg,et al.  How do API documentation and static typing affect API usability? , 2014, ICSE.

[5]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[6]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[7]  Hans-Peter Kriegel,et al.  DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..

[8]  Lars Grunske,et al.  Dimensions and Metrics for Evaluating Recommendation Systems , 2014, Recommendation Systems in Software Engineering.

[9]  Arie van Deursen,et al.  Semantic Versioning versus Breaking Changes: A Study of the Maven Repository , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[10]  Giuliano Antoniol,et al.  Moving to smaller libraries via clustering and genetic algorithms , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[11]  Lu Fang,et al.  APIExample: An effective web search based usage example recommendation system for java APIs , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[12]  Martin P. Robillard,et al.  Temporal analysis of API usage concepts , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[13]  Westley Weimer,et al.  Synthesizing API usage examples , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[14]  Keith Frampton,et al.  Cohesion Metrics for Predicting Maintainability of Service-Oriented Software , 2007 .

[15]  Houari A. Sahraoui,et al.  Mining Complex Temporal API Usage Patterns: An Evolutionary Approach , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[16]  Houari A. Sahraoui,et al.  A cooperative approach for combining client-based and library-based API usage pattern mining , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[17]  Ken-ichi Matsumoto,et al.  Using Co-change Histories to Improve Bug Localization Performance , 2013, 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[18]  Giuliano Antoniol,et al.  Knowledge-based library re-factoring for an open source project , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[19]  Martin P. Robillard,et al.  Using Structure-Based Recommendations to Facilitate Discoverability in APIs , 2011, ECOOP.

[20]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[21]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[22]  Katsuro Inoue,et al.  Search-based software library recommendation using multi-objective optimization , 2017, Inf. Softw. Technol..

[23]  Daqing Hou,et al.  An evaluation of the strategies of sorting, filtering, and grouping API methods for Code Completion , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[24]  Houari A. Sahraoui,et al.  Mining Multi-level API Usage Patterns , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[25]  Mira Mezini,et al.  Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[26]  Arie van Deursen,et al.  Measuring software library stability through historical version analysis , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[27]  Houari A. Sahraoui,et al.  Could We Infer Unordered API Usage Patterns Only Using the Library Source Code? , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[28]  Collin McMillan,et al.  ExPort: Detecting and visualizing API usages in large source code repositories , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[29]  Hung Viet Nguyen,et al.  Graph-based pattern-oriented, context-sensitive source code completion , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[30]  Michael W. Godfrey,et al.  Detecting API usage obstacles: A study of iOS and Android developer questions , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[31]  Katsuro Inoue,et al.  Do developers update their library dependencies? , 2017, Empirical Software Engineering.

[32]  Houari A. Sahraoui,et al.  Visualization based API usage patterns refining , 2015, 2015 IEEE 3rd Working Conference on Software Visualization (VISSOFT).

[33]  Marco Tulio Valente,et al.  Documenting APIs with examples: Lessons learned with the APIMiner platform , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[34]  Ken-ichi Matsumoto,et al.  Mining A change history to quickly identify bug locations : A case study of the Eclipse project , 2013, 2013 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[35]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[36]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[37]  Daren Yu,et al.  Measuring the preferential attachment mechanism in citation networks , 2008 .

[38]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[39]  Giuliano Antoniol,et al.  Library miniaturization using static and dynamic information , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[40]  Kai Chen,et al.  Mining succinct and high-coverage API usage patterns from source code , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[41]  Houari A. Sahraoui,et al.  An observational study on API usage constraints and their documentation , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[42]  Chanchal Kumar Roy,et al.  CSCC: Simple, Efficient, Context Sensitive Code Completion , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[43]  Katsuro Inoue,et al.  Visualizing the Evolution of Systems and Their Library Dependencies , 2014, 2014 Second IEEE Working Conference on Software Visualization.