Intelligent search techniques for large software systems

ACKNOWLEDGEMENTS I would like to acknowledge the help that I have received during my research. Grateful thanks to: • Dr. Timothy Lethbridge, my supervisor, for his support, guidance, patience and intelligent comments. • The KBRE group for their help, comments, and the valuable discussions with them. • The software engineers who participated in this study. • My friends for their concerns and encouragements. • My family, for the endless support to me. iii ABSTRACT There are many tools available today to help software engineers search in source code systems. It is often the case, however, that there is a gap between what people really want to find and the actual query strings they specify. This is because a concept in a software system may be represented by many different terms, while the same term may have different meanings in different places. Therefore, software engineers often have to guess as they specify a search, and often have to repeatedly search before finding what they want. To alleviate the search problem, this thesis describes a study of what we call intelligent search techniques as implemented in a software exploration environment, whose purpose is to facilitate software maintenance. We propose to utilize some information retrieval techniques to automatically apply transformations to the query strings. The thesis first introduces the intelligent search techniques used in our study, including abbreviation concatenation and abbreviation expansion. Then it describes in detail the rating algorithms used to evaluate the query results' similarity to the original query strings. Next, we describe a series of experiments we conducted to assess the effectiveness of both the intelligent search methods and our rating algorithms. Finally, we describe how we use the analysis of the experimental results to recommend an effective combination of searching techniques for software maintenance, as well as to guide our future research. Many researchers in the reverse engineering community have shown that search is a major task of software maintenance [14]. In large software systems, searching in the source code is even more time consuming, so the task of program comprehension is even more difficult. Therefore, tools targeted to the search problems in source code should help improve the effectiveness of program comprehension. Search tools, such as Unix 'grep' or the facilities provided with code exploration systems such as Source Navigator [25], are used every day by many software engineers to facilitate program comprehension. Even with the regular search …

[1]  Betty Kirkpatrick,et al.  Roget's Thesaurus , 1852 .

[2]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[3]  Nicolas Anquetil,et al.  Extracting concepts from file names; a new file clustering criterion , 1998, Proceedings of the 20th International Conference on Software Engineering.

[4]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[5]  Timothy Lethbridge,et al.  A little knowledge can go a long way towards program understanding , 1997, Proceedings Fifth International Workshop on Program Comprehension. IWPC'97.

[6]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[7]  Timothy C. Lethbridge,et al.  Studies of the Work Practices of Software Engineers , 2002 .

[8]  Ian Sommerville,et al.  An information retrieval system for software components , 1988, SIGF.

[9]  Timothy C. Lethbridge,et al.  Architecture of a Source Code Exploration Tool: A Software Engineering Case Study1 , 1997 .

[10]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[11]  C. Boldyreff,et al.  Reuse, software concepts, descriptive methods, and the practitioner project , 1989, SOEN.

[12]  Ricardo A. Baeza-Yates,et al.  Introduction to Data Structures and Algorithms Related to Information Retrieval , 1992, Information Retrieval: Data Structures & Algorithms.

[13]  William B. Frakes,et al.  Software reuse through information retrieval , 1986, SIGF.

[14]  John Bolstad A proposed classification scheme for computer program libraries , 1975, SGNM.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Louise T. Su The Relevance of Recall and Precision in User Evaluation , 1994, J. Am. Soc. Inf. Sci..