A search engine for finding and reusing architecturally significant code

A code search engine to automate the discovery, extraction and indexing of tactics.A big data compatible architecture to search through 22 million source files.Novel techniques to detect tactics and technical context in which they are used.Introducing a novel ranking algorithm to order the retrieved tactical files.Enhancing the state-of-the-art code search engines in finding tactical-code. Architectural tactics are the building blocks of software architecture. They describe solutions for addressing specific quality concerns, and are prevalent across many software systems. Once a decision is made to utilize a tactic, the developer must generate a concrete plan for writing code and implementing the tactic. Unfortunately, this is a non-trivial task even for experienced developers. Often, developers resort to using search engines, crowd-sourcing websites, or discussion forums to find sample code snippets to implement a tactic. A fundamental problem of finding implementation for architectural tactics/patterns is the mismatch between the high-level intent reflected in the descriptions of these patterns and the low-level implementation details of them. To reduce this mismatch, we created a novel Tactic Search Engine called ArchEngine (ARCHitecture search ENGINE). ArchEngine can replace this manual internet-based search process and help developers find and reuse tactical code from a wide range of open source systems. ArchEngine helps developers find implementation examples of an architectural tactic for a given technical context. It uses information retrieval and program analysis techniques to retrieve applications that implement these design concepts. Furthermore, it lists and rank the code snippets where the patterns/tactics are located. Our case study with 21 graduate students (with experience level of junior software developers) shows that ArchEngine is more effective than other search engines (e.g., Krugle and Koders) in helping programmers to quickly find implementations of architectural tactics/patterns.

[1]  Collin McMillan,et al.  Recommending source code for use in rapid software prototypes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[2]  Gang Huang,et al.  Runtime recovery and manipulation of software architecture of component-based systems , 2006, Automated Software Engineering.

[3]  Oleksandr Panchenko,et al.  What do developers search for in source code and why , 2011, SUITE '11.

[4]  Michael D. Gordon,et al.  Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[5]  Jane Cleland-Huang,et al.  A tactic-centric approach for automating traceability of quality concerns , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[6]  Joel Ossher,et al.  Sourcerer: An internet-scale software repository , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[7]  Collin McMillan,et al.  Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications , 2012, IEEE Transactions on Software Engineering.

[8]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[9]  Colin Atkinson,et al.  Code Conjurer: Pulling Reusable Software out of Thin Air , 2008, IEEE Software.

[10]  Brad A. Myers,et al.  Mica: A Web-Search Tool for Finding API Components and Examples , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[11]  Jane Cleland-Huang,et al.  Using tactic traceability information models to reduce the risk of architectural degradation during system maintenance , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[12]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[13]  Koushik Sen,et al.  SNIFF: A Search Engine for Java Using Free-Form Queries , 2009, FASE.

[14]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[15]  Patrick Mäder,et al.  Variability points and design pattern usage in architectural tactics , 2012, SIGSOFT FSE.

[16]  Jezreel Mejia,et al.  Knowledge representation and information extraction for analysing architectural patterns , 2016, Sci. Comput. Program..

[17]  Jane Cleland-Huang,et al.  Tracing architectural concerns in high assurance systems: (NIER track) , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[18]  Robert Hanmer,et al.  Patterns for Fault Tolerant Software , 2007 .

[19]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[20]  Yuanfang Cai,et al.  Leveraging design rules to improve software architecture recovery , 2013, QoSA '13.

[21]  Jan Bosch,et al.  Documenting after the fact: Recovering architectural design decisions , 2008, J. Syst. Softw..

[22]  Sang-Won Lee,et al.  An efficient inverted index technique for XML documents using RDBMS , 2003, Inf. Softw. Technol..

[23]  Cristina V. Lopes,et al.  How Well Do Search Engines Support Code Retrieval on the Web? , 2011, TSEM.

[24]  Hugh E. Williams,et al.  Efficient online index maintenance for contiguous inverted lists , 2006, Inf. Process. Manag..

[25]  Jane Cleland-Huang,et al.  Archie: a tool for detecting, monitoring, and preserving architecturally significant code , 2014, FSE 2014.

[26]  Charles L. A. Clarke,et al.  Archetypal source code searches: a survey of software developers and maintainers , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[27]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[28]  Sushil Krishna Bajracharya,et al.  Sourcerer: a search engine for open source code supporting structure-based search , 2006, OOPSLA '06.

[29]  Gerhard Fischer,et al.  Supporting reuse by delivering task-relevant and personalized information , 2002, ICSE '02.

[30]  Liwen Vaughan,et al.  New measurements for search engine evaluation proposed and tested , 2004, Inf. Process. Manag..

[31]  Collin McMillan,et al.  A search engine for finding highly relevant applications , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[32]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[33]  Chung-Horng Lung,et al.  Applications of clustering techniques to software partitioning, recovery and restructuring , 2004, J. Syst. Softw..

[34]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[35]  Alexander L. Wolf,et al.  Acm Sigsoft Software Engineering Notes Vol 17 No 4 Foundations for the Study of Software Architecture , 2022 .

[36]  Collin McMillan,et al.  Portfolio: Searching for relevant functions and their usages in millions of lines of code , 2013, TSEM.

[37]  Jane Cleland-Huang,et al.  Detecting, Tracing, and Monitoring Architectural Tactics in Code , 2016, IEEE Transactions on Software Engineering.

[38]  Sjaak Brinkkemper,et al.  Journal of Software Maintenance and Evolution: Research and Practice Design Preservation over Subsequent Releases of a Software Product: a Case Study of Baan Erp , 2022 .

[39]  Otis Gospodnetic,et al.  Lucene in Action, Second Edition: Covers Apache Lucene 3.0 , 2010 .

[40]  Seung-won Hwang,et al.  Towards an Intelligent Code Search Engine , 2010, AAAI.

[41]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[42]  Paul Clements,et al.  Software architecture in practice , 1999, SEI series in software engineering.

[43]  Alexander Chatzigeorgiou,et al.  Design Pattern Detection Using Similarity Scoring , 2006, IEEE Transactions on Software Engineering.