Duplicate finder toolkit

Software documentation is a significant component of modern software systems. Each year it becomes more and more complicated, just as the software itself. One of the aspects that negatively impact documentation quality is the presence of textual duplicates. Textual duplicates encountered in software documentation are inherently imprecise, i.e. in a single document the same information may be presented many times with different levels of detail and in various contexts. Documentation maintenance is an acute problem, and there is a strong demand for automation tools in this domain. In this study we present the Duplicate Finder Toolkit, a tool which assists an expert with duplicate maintenance-related tasks. Our tool can facilitate the maintenance process in a number of ways: 1) detection of both exact and near duplicates 2) duplicate visualization via heat maps 3) duplicate analysis - comparison of several duplicate instances, evaluation of their differences, exploration of duplicate context 4) duplicate manipulation and extraction.

[1]  Shigeru Chiba,et al.  Tool support for crosscutting concerns of API documentation , 2010, AOSD.

[2]  Jaroslav Porubän,et al.  Reusable software documentation with phrase annotations , 2014, Central European Journal of Computer Science.

[3]  P. Bassctt Framing software reuse - lessons from real world , 1997 .

[4]  Xavier Blanc,et al.  Documentation Reuse: Hot or Not? An Empirical Study , 2017, ICSR.

[5]  D. V. Koznov,et al.  Detecting Near Duplicates in Software Documentation , 2017, Program. Comput. Softw..

[6]  D. V. Koznov,et al.  DocLine: A method for software product lines documentation development , 2008, Programming and Computer Software.

[7]  David Lorge Parnas,et al.  Precise Documentation: The Key to Better Software , 2010, The Future of Software Engineering.

[8]  Bernhard Schätz,et al.  Can clone detection support quality assessments of requirements specifications? , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[9]  William F. Smyth,et al.  Efficient token based clone detection with flexible tokenization , 2007, ESEC-FSE companion '07.