The shape of feature code: an analysis of twenty C-preprocessor-based systems

Feature annotations (e.g., code fragments guarded by #ifdef C-preprocessor directives) control code extensions related to features. Feature annotations have long been said to be undesirable. When maintaining features that control many annotations, there is a high risk of ripple effects. Also, excessive use of feature annotations leads to code clutter, hinder program comprehension and harden maintenance. To prevent such problems, developers should monitor the use of feature annotations, for example, by setting acceptable thresholds. Interestingly, little is known about how to extract thresholds in practice, and which values are representative for feature-related metrics. To address this issue, we analyze the statistical distribution of three feature-related metrics collected from a corpus of 20 well-known and long-lived C-preprocessor-based systems from different domains. We consider three metrics: scattering degree of feature constants, tangling degree of feature expressions, and nesting depth of preprocessor annotations. Our findings show that feature scattering is highly skewed; in 14 systems (70 %), the scattering distributions match a power law, making averages and standard deviations unreliable limits. Regarding tangling and nesting, the values tend to follow a uniform distribution; although outliers exist, they have little impact on the mean, suggesting that central statistics measures are reliable thresholds for tangling and nesting. Following our findings, we then propose thresholds from our benchmark data, as a basis for further investigations.

[1]  Mia Hubert,et al.  An adjusted boxplot for skewed distributions , 2008, Comput. Stat. Data Anal..

[2]  Oscar Nierstrasz,et al.  Comparative analysis of evolving software systems using the Gini coefficient , 2009, 2009 IEEE International Conference on Software Maintenance.

[3]  Lucas Batista Leite de Souza,et al.  Do software categories impact coupling metrics? , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[4]  Gregor Snelting,et al.  On the inference of configuration structures from source code , 1994, Proceedings of 16th International Conference on Software Engineering.

[5]  Sven Apel,et al.  Preprocessor-based variability in open-source and industrial software systems: An empirical study , 2016, Empirical Software Engineering.

[6]  Tiago L. Alves,et al.  Deriving metric thresholds from benchmark data , 2010, 2010 IEEE International Conference on Software Maintenance.

[7]  C. Gillespie The poweRlaw package: a general overview , 2009 .

[8]  Alexander Serebrenik,et al.  Theil index for aggregation of software metrics values , 2010, 2010 IEEE International Conference on Software Maintenance.

[9]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[10]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[11]  Sergiu M. Dascalu,et al.  Unit-level test adequacy criteria for visual dataflow languages and a testing methodology , 2008, TSEM.

[12]  Gunter Saake,et al.  Feature-Oriented Software Product Lines , 2013, Springer Berlin Heidelberg.

[13]  Sven Apel,et al.  Coevolution of variability models and related software artifacts , 2016, Empirical Software Engineering.

[14]  Marco Tulio Valente,et al.  Extracting relative thresholds for source code metrics , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[15]  Gregor Kiczales,et al.  Aspect-oriented programming , 2001, ESEC/FSE-9.

[16]  Steve Counsell,et al.  Power law distributions in class relationships , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[17]  Sven Apel,et al.  Does feature scattering follow power-law distributions?: an investigation of five pre-processor-based systems , 2014, FOSD '14.

[18]  Jean-Marie Favre,et al.  Preprocessors from an abstract point of view , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[19]  Dror G. Feitelson,et al.  Characterization and assessment of the linux configuration complexity , 2013, 2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[20]  Alfred V. Aho,et al.  Do Crosscutting Concerns Cause Defects? , 2008, IEEE Transactions on Software Engineering.

[21]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[22]  Gunter Saake,et al.  Feature-Oriented Software Product Lines , 2013, Springer Berlin Heidelberg.

[23]  Cristina V. Lopes,et al.  Aspect-oriented programming , 1999, ECOOP Workshops.

[24]  Thomas Leich,et al.  Aspectual Feature Modules , 2008, IEEE Transactions on Software Engineering.

[25]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[26]  Marco Tulio Valente,et al.  RTTool: A Tool for Extracting Relative Thresholds for Source Code Metrics , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[27]  Colin S Gillespie,et al.  Fitting Heavy Tailed Distributions: The poweRlaw Package , 2014, 1407.3492.

[28]  Audris Mockus,et al.  How Does Context Affect the Distribution of Software Maintainability Metrics? , 2013, 2013 IEEE International Conference on Software Maintenance.

[29]  Alexander Serebrenik,et al.  You can't control the unfamiliar: A study on the relations between aggregation techniques for software metrics , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[30]  Michele Marchesi,et al.  Power-Laws in a Large Object-Oriented Software System , 2007, IEEE Transactions on Software Engineering.

[31]  Henry Spencer,et al.  #ifdef Considered Harmful, or Portability Experience with C News , 1992, USENIX Summer.

[32]  Sven Apel,et al.  Analyzing the discipline of preprocessor annotations in 30 million lines of C code , 2011, AOSD '11.

[33]  Ian H. Witten,et al.  Can We Avoid High Coupling? , 2011, ECOOP.

[34]  Krzysztof Czarnecki,et al.  Coevolution of variability models and related artifacts: a case study from the Linux kernel , 2013, SPLC '13.

[35]  Yuanyuan Song,et al.  Information hiding interfaces for aspect-oriented design , 2005, ESEC/FSE-13.

[36]  Diomidis Spinellis,et al.  Power laws in software , 2008, TSEM.

[37]  Marco Tulio Valente,et al.  A Semi-Automatic Approach for Extracting Software Product Lines , 2012, IEEE Transactions on Software Engineering.

[38]  Peter Nijkamp,et al.  Accessibility of Cities in the Digital Economy , 2004, cond-mat/0412004.

[39]  Sven Apel,et al.  An analysis of the variability in forty preprocessor-based software product lines , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[40]  Sven Apel,et al.  Feature scattering in the large: a longitudinal study of Linux kernel device drivers , 2015, MODULARITY.

[41]  Sven Apel,et al.  Granularity in software product lines , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[42]  Sven Apel,et al.  The road to feature modularity? , 2011, SPLC '11.

[43]  Roberto da Silva Bigonha,et al.  Identifying thresholds for object-oriented software metrics , 2012, J. Syst. Softw..