Hybrid storage for enabling fully-featured text search and fine-grained structural search over source code

Searching is an important activity in software maintenance. Dedicated data structures have been used to support either textual or structural queries over source code. The goal of this ongoing research is to elaborate a hybrid data storage that enables simultaneous textual and structural search. The naive adjacency list method has been combined with the inverted index approach. The data model has been enhanced with the use of recent data compression approaches for column-oriented databases to allow no-loss albeit compact storage of fine-grained structural data. The graph indexing has enabled the proposed data model to expeditiously answer fine-grained structural queries. This paper describes the basics of the proposed approach and estimates its feasibility.

[1]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[2]  Robert J. Walker,et al.  Approximate Structural Context Matching: An Approach to Recommend Relevant Examples , 2006, IEEE Transactions on Software Engineering.

[3]  Arie Shoshani,et al.  Using Bitmap Index for Joint Queries on Structured and Text Data , 2009, New Trends in Data Warehousing and Data Analysis.

[4]  Sushil Krishna Bajracharya,et al.  Sourcerer: a search engine for open source code supporting structure-based search , 2006, OOPSLA '06.

[5]  Timothy C. Lethbridge,et al.  Studies of the Work Practices of Software Engineers , 2002 .

[6]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[7]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[8]  Charles L. A. Clarke,et al.  Archetypal source code searches: a survey of software developers and maintainers , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[9]  Alexander Zeier,et al.  A Hybrid Row-Column OLTP Database Architecture for Operational Reporting , 2008, BIRTE.

[10]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[11]  Ted J. Biggerstaff,et al.  Program understanding and the concept assignment problem , 1994, CACM.

[12]  Andrew Begel Codifier: A Programmer-Centric Search User Interface , 2008 .

[13]  Denys Poshyvanyk,et al.  Source Code Exploration with Google , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[14]  Peter Sanders,et al.  Compressed Inverted Indexes for In-Memory Search Engines , 2008, ALENEX.

[15]  Dapeng Liu,et al.  Challenges of using LSI for concept location , 2007, ACM-SE 45.

[16]  Colin Atkinson,et al.  Using the Web as a Reuse Repository , 2006, ICSR.

[17]  Christopher Exton,et al.  Assisting Concept Location in Software Comprehension , 2007, PPIG.