An empirical study of data constraint implementations in Java

Software systems are designed according to guidelines and constraints defined by business rules. Some of these constraints define the allowable or required values for data handled by the systems. These data constraints usually originate from the problem domain (e.g., regulations), and developers must write code that enforces them. Understanding how data constraints are implemented is essential for testing, debugging, and software change. Unfortunately, there are no widely-accepted guidelines or best practices on how to implement data constraints. This paper presents an empirical study that investigates how data constraints are implemented in Java. We study the implementation of 187 data constraints extracted from the documentation of eight real-world Java software systems. First, we perform a qualitative analysis of the textual description of data constraints and identify four data constraint types. Second, we manually identify the implementations of these data constraints and reveal that they can be grouped into 30 implementation patterns. The analysis of these implementation patterns indicates that developers prefer a handful of patterns when implementing data constraints and deviations from these patterns are associated with unusual implementation decisions or code smells. Third, we develop a tool-assisted protocol that allows us to identify 256 additional trace links for the data constraints implemented using the 13 most common patterns. We find that almost half of these data constraints have multiple enforcing statements, which are code clones of different types. J. M. Florez, Z. Zhang, S. Wei, and A. Marcus Department of Computer Science The University of Texas at Dallas Richardson, TX, USA E-mail: {jflorez, zenong, swei, amarcus}@utdallas.edu L. Moreno CQSE America Santa Clara, CA, USA E-mail: moreno@cqse-america.com ar X iv :2 10 7. 04 72 0v 1 [ cs .S E ] 9 J ul 2 02 1 2 Juan Manuel Florez et al.