Detecting Duplications in Sequence Diagrams Based on Suffix Trees

With the popularity of UML and MDA, models are replacing source code as core artifacts of software development and maintenance. But duplications in models reduce models' maintainability and reusability. To address the problem, we should detect duplications first. As an initial step to address the problem, we propose an approach to detect duplications in sequence diagrams. With special preprocessing, we convert 2-dimensional sequence diagrams into a 1-dimensional array. Then we construct a suffix tree of the array. We revise the traditional construction algorithm of suffix trees by proposing a special algorithm to detect common prefixes of suffixes. The algorithm ensures that every duplication detected with the suffix tree can be extracted into a separate reusable sequence diagram. With the suffix tree, duplications are found as refactoring candidates. With tool support, the proposed approach has been applied to real industrial projects, and the evaluation results suggest that the approach is effective.