Inferring the data access from the clients of generic APIs

Many programs access external data sources through generic APIs. The class hierarchy of such a generic API does not reflect the schema of any particular data source, and thus it is hard to clarify what data an API client accesses and how it obtains them. This makes it difficult to maintain the API clients. In this paper, we show that the data access of an API client can be recovered through static analysis on the client's source code. We provide a formal and intuitive way to represent the data access, as a graph of so-called summoning snippets. Each snippet stands for a type of data accessed by the client, and carries the code slice from the client about how to obtain the data via the API. We provide an automated approach to inferring a complete and well-simplified set of summoning snippets from the client source code, based on points-to analysis and code slicing. We implement this approach as a development assistant tool, and evaluate it on eight open source data processing programs, with average precision and recall of 89% and 95%, respectively. Further inspection of these clients, as well as a user study about writing data accessing code on their data sources, show that the inference results are useful in the inspection of existing clients and the development of new data access logics.

[1]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[2]  Barbara G. Ryder,et al.  Interprocedural Def-Use Associations for C Systems with Single Level Pointers , 1994, IEEE Trans. Software Eng..

[3]  Benjamin Livshits,et al.  Merlin: specification inference for explicit information flow problems , 2009, PLDI '09.

[4]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[5]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[6]  Thomas W. Reps,et al.  The use of program dependence graphs in software engineering , 1992, International Conference on Software Engineering.

[7]  Tok Wang Ling,et al.  Exploring into Programs for the Recovery of Data Dependencies Designed , 2002, IEEE Trans. Knowl. Data Eng..

[8]  Kyuseok Shim,et al.  XTRACT: a system for extracting document type descriptors from XML documents , 2000, SIGMOD '00.

[9]  Jian Pei,et al.  Mining API patterns as partial orders from source code: from usage scenarios to specifications , 2007, ESEC-FSE '07.

[10]  Hui Song,et al.  Inferring meta-models for runtime system data from the clients of management APIs , 2010, MODELS'10.

[11]  Michal Antkiewicz,et al.  Engineering of Framework-Specific Modeling Languages , 2009, IEEE Transactions on Software Engineering.

[12]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[13]  Robert Gruber,et al.  PADS: a domain-specific language for processing ad hoc data , 2005, PLDI '05.

[14]  Barbara G. Ryder,et al.  Parameterized object sensitivity for points-to analysis for Java , 2005, TSEM.

[15]  Jianhua Shao,et al.  Program slicing in the presence of database state , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[16]  David Grove,et al.  A framework for call graph construction algorithms , 2001, TOPL.

[17]  Christopher Ré,et al.  Automatic Optimization for MapReduce Programs , 2011, Proc. VLDB Endow..

[18]  David Walker,et al.  From dirt to shovels: fully automatic tool generation from ad hoc data , 2008, POPL '08.

[19]  Barbara G. Ryder,et al.  Relevant context inference , 1999, POPL '99.