Reverse engineering language product lines from existing DSL variants

The use of domain-specific languages (DSL) has become a successful technique for developing complex systems. Moreover, we can find different DSLs variants adapted to specific purposes that share some features. The challenge for language designers is to take advantage of the commonalities between DSLs variants by reusing previously defined language constructs [7]. To tackle this, the research community in software language engineering proposed to apply Software Product Line (SPLs) techniques in the construction of DSLs [4, 6] leading to the notion of Language Product Pines (LPLs) [3, 7]. Similarly to software product lines, we find systems that started with a single variant and were forked as many times as variants with new language constructs were needed. In such cases, LPLs can be built from a set of existing DSL variants through reverse-engineering techniques [2]. First, those techniques should be able to recover a language modular design that encodes all language constructs existing in all DSL variants. Second, to synthesize the variability models to represent the common and variant parts of the LPL. In a previous work [4], we introduced an approach to automatically infer a language modular design from a given set of DSL variants. In the paper presented [5] we present a complete reverse-engineering technique that produces not only the language modular designs, but the entire language product line. Concretely we show how to reverse engineering booth the abstract syntax and the semantics specifying them in terms of well-known formalisms i.e., feature models (FM) and orthogonal variability models (OVM) and considering the diverse and multiple dimensions that such a variability may present. Moreover, we show how those variability models can be exploited to configure and assembly new DSL variants. In this paper also present how we have relied on this technique within an industrial project, which is composed of three variants of a DSL for finite-state machines [1]. In this project, we manually developed an oracle to know in advance the existing variation points. Then, we execute our approach on these DSL variants, and we compare the produced results against the expected ones showing that our approach is capable of correctly identifying commonalities and variability.