Query reformulation by leveraging crowd wisdom for scenario-based software search

The Internet-scale open source software (OSS) production in various communities are generating abundant reusable resources for software developers. However, how to retrieve and reuse the desired and mature software from huge amounts of candidates is a great challenge: there are usually big gaps between the user application contexts (that often used as queries) and the OSS key words (that often used to match the queries). In this paper, we define the scenario-based query problem for OSS retrieval, and then we propose a novel approach to reformulate the raw query by leveraging the crowd wisdom from millions of developers to improve the retrieval results. We build a software-specific domain lexical database based on the knowledge in open source communities, by which we can expand and optimize the input queries. The experiment results show that, our approach can reformulate the initial query effectively and outperforms other existing search engines significantly at finding mature software.

[1]  Atul Prakash,et al.  A Framework for Source Code Search Using Program Patterns , 1994, IEEE Trans. Software Eng..

[2]  Jianfeng Gao,et al.  Query expansion using path-constrained random walks , 2013, SIGIR.

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Gang Yin,et al.  Ranking open source software based on crowd wisdom , 2015, 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[5]  David Lo,et al.  Query expansion via WordNet for effective code search , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[6]  Amanda Spink,et al.  Use of query reformulation and relevance feedback by Excite users , 2000, Internet Res..

[7]  Hongwu Qin,et al.  A survey of query expansion, query suggestion and query refinement techniques , 2015, 2015 4th International Conference on Software Engineering and Computer Systems (ICSECS).

[8]  Gang Yin,et al.  Determinants of pull-based development in the context of continuous integration , 2016, Science China Information Sciences.

[9]  Päivi Majaranta,et al.  Eye-Tracking Reveals the Personal Styles for Search Result Evaluation , 2005, INTERACT.

[10]  David Lo,et al.  Orion: A Software Project Search Engine with Integrated Diverse Software Artifacts , 2013, 2013 18th International Conference on Engineering of Complex Computer Systems.

[11]  S. Jamieson Likert scales: how to (ab)use them , 2004, Medical education.

[12]  Gang Yin,et al.  Evaluating Bug Severity Using Crowd-based Knowledge: An Exploratory Study , 2015, Internetware.

[13]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[14]  Gang Yin,et al.  Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? , 2016, Inf. Softw. Technol..

[15]  Sushil Krishna Bajracharya,et al.  CodeGenie: using test-cases to search and reuse source code , 2007, ASE '07.

[16]  Ioannis Stamelos,et al.  The SQO-OSS Quality Model: Measurement Based Open Source Software Evaluation , 2008, OSS.

[17]  Leman Akoglu,et al.  Min(e)d your tags: Analysis of Question response time in StackOverflow , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[18]  Jacky W. Keung,et al.  Assessing the Representativeness of Open Source Projects in Empirical Software Engineering Studies , 2012, 2012 19th Asia-Pacific Software Engineering Conference.

[19]  Michael Chau,et al.  Comparison of Three Vertical Search Spiders , 2003, Computer.

[20]  Steven P. Reiss,et al.  Semantics-based code search , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[21]  Mandar Mitra,et al.  Exploring Query Categorisation for Query Expansion: A Study , 2015, ArXiv.

[22]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[23]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[24]  Tarun Jaiswal,et al.  Fundamentals of Software Engineering , 2017, Lecture Notes in Computer Science.

[25]  Charles W. Krueger,et al.  Software reuse , 1992, CSUR.

[26]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[27]  Gang Yin,et al.  OSSEAN: Mining Crowd Wisdom in Open Source Communities , 2015, 2015 IEEE Symposium on Service-Oriented System Engineering.