Do #ifdefs influence the occurrence of vulnerabilities? an empirical study of the linux kernel

Preprocessors support the diversification of software products with #ifdefs, but also require additional effort from developers to maintain and understand variable code. We conjecture that #ifdefs cause developers to produce more vulnerable code because they are required to reason about multiple features simultaneously and maintain complex mental models of dependencies of configurable code. We extracted a variational call graph across all configurations of the Linux kernel, and used configuration complexity metrics to compare vulnerable and non-vulnerable functions considering their vulnerability history. Our goal was to learn about whether we can observe a measurable influence of configuration complexity on the occurrence of vulnerabilities. Our results suggest, among others, that vulnerable functions have higher variability than non-vulnerable ones and are also constrained by fewer configuration options. This suggests that developers are inclined to notice functions appear in frequently-compiled product variants. We aim to raise developers' awareness to address variability more systematically, since configuration complexity is an important, but often ignored aspect of software product lines.

[1]  Mohammad Zulkernine,et al.  Can complexity, coupling, and cohesion metrics be used as early indicators of vulnerabilities? , 2010, SAC '10.

[2]  Sven Apel,et al.  Scalable analysis of variable software , 2013, ESEC/FSE 2013.

[3]  Krzysztof Czarnecki,et al.  Where Do Configuration Constraints Stem From? An Extraction Approach and an Empirical Study , 2015, IEEE Transactions on Software Engineering.

[4]  Vitaly Shmatikov,et al.  The most dangerous code in the world: validating SSL certificates in non-browser software , 2012, CCS.

[5]  Shari Lawrence Pfleeger,et al.  Software metrics (2nd ed.): a rigorous and practical approach , 1997 .

[6]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[7]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[8]  Wolfgang Schröder-Preikschat,et al.  Is The Linux Kernel a Software Product Line , 2007 .

[9]  Keqing He,et al.  A qualitative method for measuring the structural complexity of software systems based on complex networks , 2005, 12th Asia-Pacific Software Engineering Conference (APSEC'05).

[10]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[11]  Márcio Ribeiro,et al.  The Love/Hate Relationship with the C Preprocessor: An Interview Study , 2015, ECOOP.

[12]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[13]  Sven Apel,et al.  Preprocessor-based variability in open-source and industrial software systems: An empirical study , 2016, Empirical Software Engineering.

[14]  Gunter Saake,et al.  Type checking annotation-based product lines , 2012, TSEM.

[15]  Klaus Pohl,et al.  Software Product Line Engineering - Foundations, Principles, and Techniques , 2005 .

[16]  Sven Apel,et al.  Characterizing complexity of highly-configurable systems with variational call graphs: analyzing configuration options interactions complexity in function calls , 2015, HotSoS.

[17]  Kyo Chul Kang,et al.  Feature-Oriented Domain Analysis (FODA) Feasibility Study , 1990 .

[18]  David A. Wagner,et al.  Intrusion detection via static analysis , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[19]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[20]  Klaus Pohl,et al.  Software product line testing , 2006, CACM.

[21]  Neil F. Johnson,et al.  Simply Complexity: A Clear Guide to Complexity Theory , 2007 .

[22]  Carsten Sinz,et al.  Configuration Lifting: Verification meets Software Configuration , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[23]  Bashar Nuseibeh,et al.  Feature interaction: the security threat from within software systems , 2008 .

[24]  Sven Apel,et al.  Exploring feature interactions in the wild: the new feature-interaction challenge , 2013, FOSD '13.

[25]  Gunter Saake,et al.  A Classification and Survey of Analysis Strategies for Software Product Lines , 2014, ACM Comput. Surv..

[26]  Sebastian Erdweg,et al.  Variability-aware parsing in the presence of lexical macros and conditional compilation , 2011, OOPSLA '11.

[27]  Thomas D. LaToza,et al.  Maintaining mental models: a study of developer work habits , 2006, ICSE.

[28]  Iago Abal,et al.  42 variability bugs in the linux kernel: a qualitative analysis , 2014, ASE.

[29]  Myra B. Cohen,et al.  Feature Interaction Faults Revisited: An Exploratory Study , 2011, 2011 IEEE 22nd International Symposium on Software Reliability Engineering.

[30]  Sven Apel,et al.  An analysis of the variability in forty preprocessor-based software product lines , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[31]  Sven Apel,et al.  From Developer Networks to Verified Communities: A Fine-Grained Approach , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[32]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[33]  Sven Apel,et al.  Does the discipline of preprocessor annotations matter?: a controlled experiment , 2014, GPCE '13.

[34]  Sven Apel,et al.  Variability encoding: From compile-time to load-time variability , 2016, J. Log. Algebraic Methods Program..

[35]  Wolfgang Schröder-Preikschat,et al.  A quantitative analysis of aspects in the eCos kernel , 2006, EuroSys.

[36]  Martin Erwig,et al.  #ifdef confirmed harmful: Promoting understandable software variation , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[37]  Daniel Lohmann,et al.  Analyzing the Impact of Feature Changes in Linux , 2016, VaMoS.

[38]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[39]  Wolfgang Schröder-Preikschat,et al.  Automatic OS Kernel TCB Reduction by Leveraging Compile-Time Configurability , 2012, HotDep.

[40]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[41]  Robert Grimm,et al.  SuperC: parsing all of C by taming the preprocessor , 2012, PLDI.

[42]  Khaled El Emam,et al.  The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics , 2001, IEEE Trans. Software Eng..

[43]  Jonathan I. Maletic,et al.  An XML-Based Lightweight C++ Fact Extractor , 2003, IWPC.

[44]  Krzysztof Czarnecki,et al.  A user survey of configuration challenges in Linux and eCos , 2012, VaMoS '12.