Recognizing authors: an examination of the consistent programmer hypothesis

Software developers have individual styles of programming. This paper empirically examines the validity of the consistent programmer hypothesis: that a facet or set of facets exist that can be used to recognize the author of a given program based on programming style. The paper further postulates that the programming style means that different test strategies work better for some programmers (or programming styles) than for others. For example, all‐edges adequate tests may detect faults for programs written by Programmer A better than for those written by Programmer B. This has several useful applications: to help detect plagiarism/copyright violation of source code, to help improve the practical application of software testing, and to help pursue specific rogue programmers of malicious code and source code viruses. This paper investigates this concept by experimentally examining whether particular facets of the program can be used to identify programmers and whether testing strategies can be reasonably associated with specific programmers. Copyright © 2009 John Wiley & Sons, Ltd.

[1]  John C. Munson,et al.  Developing fault predictors for evolving software systems , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[2]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[3]  Eugene H. Spafford,et al.  Authorship analysis: identifying the author of a program , 1997, Comput. Secur..

[4]  Justin Zobel,et al.  Searching With Style: Authorship Attribution in Classic Literature , 2007, ACSC.

[5]  Sandro Morasca,et al.  Deriving models of software fault-proneness , 2002, SEKE '02.

[6]  Marc J. Balcer,et al.  The category-partition method for specifying and generating fuctional tests , 1988, CACM.

[7]  Eugene H. Spafford,et al.  Software forensics: Can we track code to its authors? , 1993, Comput. Secur..

[8]  Patrick Juola,et al.  Proving and Improving Authorship Attribution Technologies , 2004 .

[9]  Stephen G. MacDonell,et al.  A Fuzzy Logic Approach to Computer Software Source Code Authorship Analysis , 1997, ICONIP.

[10]  Victor R. Basili,et al.  Analyzing the test process using structural coverage , 1985, ICSE '85.

[11]  Charles W. Butler,et al.  Design complexity measurement and testing , 1989, CACM.

[12]  A. Jefferson Offutt,et al.  Combination testing strategies: a survey , 2005, Softw. Test. Verification Reliab..

[13]  Jane Huffman Hayes,et al.  Authorship Attribution: A Principal Component and Linear Discriminant Analysis of the Consistent Programmer Hypothesis , 2008, Int. J. Comput. Their Appl..

[14]  John Burrows,et al.  Questions of Authorship: Attribution and Beyond A Lecture Delivered on the Occasion of the Roberto Busa Award ACH-ALLC 2001, New York , 2003, Comput. Humanit..

[15]  Phyllis G. Frankl,et al.  An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria , 1991, TAV4.

[16]  Sandip C. Patel,et al.  A metrics-based software maintenance effort model , 2004, Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings..

[17]  Stephen G. MacDonell,et al.  Software Forensics: Extending Authorship Analysis Techniques to Computer Programs , 2002 .

[18]  Stephen G. MacDonell,et al.  Forensics : : old methods for a new science , 2004 .

[19]  A. Jefferson Offutt,et al.  Investigations of the software testing coupling effect , 1992, TSEM.

[20]  John F. Burrows,et al.  ‘An ocean where each kind. . .’: Statistical analysis and some major determinants of literary style , 1989, Comput. Humanit..

[21]  Jane Huffman Hayes,et al.  Journal of Software Maintenance and Evolution: Research and Practice Observe-mine-adopt (oma): an Agile Way to Enhance Software Maintainability , 2022 .

[22]  Patrick Juola,et al.  A Controlled-corpus Experiment in Authorship Identification by Cross-entropy , 2003 .

[23]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[24]  Stephen G. MacDonell,et al.  IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination): a dictionary-based system for extracting source code metrics for software forensics , 1998, Proceedings. 1998 International Conference Software Engineering: Education and Practice (Cat. No.98EX220).

[25]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[26]  Reiner R. Dumke,et al.  Techniques of Successful Application of Factor Analysis in Software Measurement , 2004, Empirical Software Engineering.

[27]  Victor R. Basili,et al.  Comparing the Effectiveness of Software Testing Strategies , 1987, IEEE Transactions on Software Engineering.

[28]  Luciano Baresi,et al.  An Introduction to Software Testing , 2006, FoVMT.

[29]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[30]  Don H. Johnson,et al.  the Kullback-Leibler distance , 2001 .

[31]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[32]  Boris Beizer,et al.  Software System Testing and Quality Assurance , 1984 .

[33]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[34]  Christian S. Collberg,et al.  Watermarking, Tamper-Proofing, and Obfuscation-Tools for Software Protection , 2002, IEEE Trans. Software Eng..

[35]  Stephen G. MacDonell,et al.  Software forensics for discriminating between program authors using case-based reasoning, feedforward neural networks and multiple discriminant analysis , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[36]  Jeffrey M. Voas,et al.  PIE: A Dynamic Failure-Based Technique , 1992, IEEE Trans. Software Eng..

[37]  R. H. Baayen,et al.  An experiment in authorship attribution , 2002 .

[38]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[39]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[40]  J. R. Horgan,et al.  A data flow coverage testing tool for C , 1992, [1992] Proceedings of the Second Symposium on Assessment of Quality Software Development Tools.

[41]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[42]  Curtis R. Cook,et al.  Programming style authorship analysis , 1989, CSC '89.