A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI

Different security issues are a common problem for open source packages archived to and delivered through software ecosystems. These often manifest themselves as software weaknesses that may lead to concrete software vulnerabilities. This paper examines various security issues in Python packages with static analysis. The dataset is based on a snapshot of all packages stored to the Python Package Index (PyPI). In total, over 197 thousand packages and over 749 thousand security issues are covered. Even under the constraints imposed by static analysis, (a) the results indicate prevalence of security issues; at least one issue is present for about 46% of the Python packages. In terms of the issue types, (b) exception handling and different code injections have been the most common issues. The subprocess module stands out in this regard. Reflecting the generally small size of the packages, (c) software size metrics do not predict well the amount of issues revealed through static analysis. With these results and the accompanying discussion, the paper contributes to the field of large-scale empirical studies for better understanding security problems in software ecosystems.

[1]  Gary McGraw,et al.  Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors , 2005, IEEE Secur. Priv..

[2]  Lefteris Angelis,et al.  A multi-target approach to estimate software vulnerability characteristics and severity scores , 2018, J. Syst. Softw..

[3]  Gordon K. Smyth,et al.  Generalized Linear Models With Examples in R , 2018 .

[4]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[5]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[6]  Brian A. Malloy,et al.  An empirical analysis of the transition from Python 2 to Python 3 , 2018, Empirical Software Engineering.

[7]  Andy Zaidman,et al.  Analyzing the State of Static Analysis: A Large-Scale Evaluation in Open Source Software , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[8]  Atsushi Yamada,et al.  Experiences with program static analysis , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[9]  Paul Ralph,et al.  Sampling in Software Engineering Research: A Critical Review and Guidelines , 2020, ArXiv.

[10]  K. Shadan,et al.  Available online: , 2012 .

[11]  Abbas Heydarnoori,et al.  An Analysis of Python's Topics, Trends, and Technologies Through Mining Stack Overflow Discussions , 2020, ArXiv.

[12]  Michele Marchesi,et al.  A Curated Benchmark Collection of Python Systems for Empirical Studies on Software Engineering , 2015, PROMISE.

[13]  Jukka Ruohonen A Demand-Side Viewpoint to Software Vulnerabilities in WordPress Plugins , 2019, EASE.

[14]  Md. Rayhanur Rahman,et al.  Share, But be Aware: Security Smells in Python Gists , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[15]  Dimitris Mitropoulos,et al.  VulinOSS: A Dataset of Security Vulnerabilities in Open-Source Systems , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[16]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[17]  James D. Herbsleb,et al.  Ecosystem-level determinants of sustained activity in open-source projects: a case study of the PyPI ecosystem , 2018, ESEC/SIGSOFT FSE.

[18]  Emerson Murphy-Hill,et al.  How Developers Diagnose Potential Security Vulnerabilities with a Static Analysis Tool , 2019, IEEE Transactions on Software Engineering.

[19]  Robert A. Stine,et al.  An Introduction to Bootstrap Methods , 1989 .

[20]  Eric Bodden,et al.  Explaining Static Analysis - A Perspective , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW).

[21]  Daniela Cruzes,et al.  Myths and Facts About Static Application Security Testing Tools: An Action Research at Telenor Digital , 2018, XP.

[22]  Baldoino Fonseca dos Santos Neto,et al.  Software Metrics and Security Vulnerabilities: Dataset and Exploratory Study , 2016, 2016 12th European Dependable Computing Conference (EDCC).

[23]  Gary McGraw,et al.  Static Analysis for Security , 2004, IEEE Secur. Priv..

[24]  KitchenhamBarbara What's up with software metrics? - A preliminary mapping study , 2010 .

[25]  Li Yu,et al.  Empirical Study of Python Call Graph , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Ricardo Dahab,et al.  Understanding How to Use Static Analysis Tools for Detecting Cryptography Misuse in Software , 2019, IEEE Transactions on Reliability.

[27]  Antoine Miné,et al.  Static Value Analysis of Python Programs by Abstract Interpretation , 2018, NFM.

[28]  Harit Shah,et al.  Security Issues on Cloud Computing , 2013, ArXiv.

[29]  Gizem Korkmaz,et al.  Modeling the impact of Python and R packages using dependency and contributor networks , 2019, Social Network Analysis and Mining.

[30]  David A. Wagner,et al.  An Empirical Study on the Effectiveness of Security Code Review , 2013, ESSoS.

[31]  Baowen Xu,et al.  An Empirical Study of Dynamic Types for Python Projects , 2018, SATE.

[32]  Markus Zimmermann,et al.  Small World with High Risks: A Study of Security Threats in the npm Ecosystem , 2019, USENIX Security Symposium.

[33]  Harald C. Gall,et al.  How developers engage with static analysis tools in different contexts , 2019, Empirical Software Engineering.

[34]  Yuming Zhou,et al.  Understanding metric-based detectable smells in Python software: A comparative study , 2018, Inf. Softw. Technol..

[35]  Min Lin,et al.  A Large-Scale Empirical Study on Vulnerability Distribution within Projects and the Lessons Learned , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[36]  Jukka Ruohonen,et al.  A Look at the Time Delays in CVSS Vulnerability Scoring , 2018, Applied Computing and Informatics.

[37]  Barbara Kitchenham,et al.  What's up with software metrics? - A preliminary mapping study , 2010, J. Syst. Softw..

[38]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[39]  Mohammad Zulkernine,et al.  Mitigating program security vulnerabilities: Approaches and challenges , 2012, CSUR.

[40]  Jukka Ruohonen An Empirical Analysis of Vulnerabilities in Python Packages for Web Applications , 2018, 2018 9th International Workshop on Empirical Software Engineering in Practice (IWESEP).

[41]  Ville Leppänen,et al.  Annotation-Based Static Analysis for Personal Data Protection , 2019, Privacy and Identity Management.

[42]  Marco Tulio Valente,et al.  What's in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform , 2018, J. Syst. Softw..

[43]  Salah Sadou,et al.  Designing a Code Vulnerability Meta-scanner , 2019, ISPEC.

[44]  Romain Robbes,et al.  The Small Project Observatory: Visualizing software ecosystems , 2010, Sci. Comput. Program..

[45]  Meiyappan Nagappan,et al.  Evaluating State-of-the-Art Free and Open Source Static Analysis Tools Against Buffer Errors in Android Apps , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).