Source Code Authorship Attribution

1

[1]  Ian D. Watson,et al.  An Introduction to Case-Based Reasoning , 1995, UK Workshop on Case-Based Reasoning.

[2]  Jack Grieve,et al.  Quantitative Authorship Attribution: An Evaluation of Techniques , 2007, Lit. Linguistic Comput..

[3]  Dale Schuurmans,et al.  Language independent authorship attribution using character level language models , 2003, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03.

[4]  S. K. Robinson,et al.  An empirical approach for detecting program similarity and plagiarism within a university programming environment , 1987 .

[5]  Naeem Seliya,et al.  Detecting outsourced student programming assignments , 2008 .

[6]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[7]  Elliot Soloway,et al.  Learning to program = learning to construct mechanisms and explanations , 1986, CACM.

[8]  Paul Clough,et al.  Creating A Corpus of Plagiarised Academic Texts , 2009 .

[9]  Jörg Kindermann,et al.  Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[10]  Mansur H. Samadzadeh,et al.  Extraction of Java program fingerprints for software authorship identification , 2004, J. Syst. Softw..

[11]  Stephen G. MacDonell,et al.  Software forensics for discriminating between program authors using case-based reasoning, feedforward neural networks and multiple discriminant analysis , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[12]  Benno Stein,et al.  Intrinsic Plagiarism Analysis with Meta Learning , 2007, PAN.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  K.W. Bowyer,et al.  Experience using "MOSS" to detect cheating on programming assignments , 1999, FIE'99 Frontiers in Education. 29th Annual Frontiers in Education Conference. Designing the Future of Science and Engineering Education. Conference Proceedings (IEEE Cat. No.99CH37011.

[15]  Sviatoslav Voloshynovskiy,et al.  Multiclass classification based on binary classifiers: On coding matrix design, reliability and maximum number of classes , 2009 .

[16]  横山 俊伸,et al.  海外出張報告 McMaster University , 2005 .

[17]  Justin Zobel,et al.  Entropy-Based Authorship Search in Large Document Collections , 2007, ECIR.

[18]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[19]  Patrick Juola,et al.  Proving and Improving Authorship Attribution Technologies , 2004 .

[20]  Patrick Brennan,et al.  A Prototype for Authorship Attribution Studies , 2006, Lit. Linguistic Comput..

[21]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[22]  Benno Stein,et al.  Plagiarism analysis, authorship identification, and near-duplicate detection PAN'07 , 2007, SIGF.

[23]  Stefanos Gritzalis,et al.  Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method , 2007, Int. J. Digit. EVid..

[24]  Stephen G. MacDonell,et al.  A Fuzzy Logic Approach to Computer Software Source Code Authorship Analysis , 1997, ICONIP.

[25]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[26]  Spiros Mancoridis,et al.  A genetic algorithm for solving the binning problem in networked applications detection , 2007, 2007 IEEE Congress on Evolutionary Computation.

[27]  Moshe Koppel,et al.  Measuring Differentiability: Unmasking Pseudonymous Authors , 2007, J. Mach. Learn. Res..

[28]  Justin Zobel,et al.  Using Relative Entropy for Authorship Attribution , 2006, AIRS.

[29]  Margaret Hamilton,et al.  Software development marketplaces: implications for plagiarism , 2007 .

[30]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[31]  Gabriella Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[32]  Robert J. Gaizauskas,et al.  Building and annotating a corpus for the study of journalistic text reuse , 2002, LREC.

[33]  Robert Bosch,et al.  Separating Hyperplanes and the Authorship of the Disputed Federalist Papers , 1998 .

[34]  Boumediene Belkhouche,et al.  Plagiarism detection in software designs , 2004, ACM-SE 42.

[35]  Spiros Mancoridis,et al.  Using code metric histograms and genetic algorithms to perform author identification for software forensics , 2007, GECCO '07.

[36]  J. Pennebaker,et al.  PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES Words of Wisdom: Language Use Over the Life Span , 2003 .

[37]  M. H. Halstead,et al.  Natural laws controlling algorithm structure? , 1972, SIGP.

[38]  Greg J. Michaelson,et al.  Automatic analysis of functional program style , 1996, Proceedings of 1996 Australian Software Engineering Conference.

[39]  Justin Zobel,et al.  Efficient plagiarism detection for large code repositories , 2007 .

[40]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[41]  K. J. Ottenstein An algorithmic approach to the detection and prevention of plagiarism , 1976, SGCS.

[42]  Robert L. Glass Special Feature: Software Theft , 1985, IEEE Software.

[43]  Curtis R. Cook,et al.  A taxonomy for programming style , 1990, CSC '90.

[44]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[45]  Martin D. S. Braine,et al.  The Ontogeny of English Phrase Structure: The First Phase , 1963 .

[46]  Eugene H. Spafford,et al.  Authorship analysis: identifying the author of a program , 1997, Comput. Secur..

[47]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[48]  Vlado Keselj,et al.  Detection of New Malicious Code Using N-grams Signatures , 2004, PST.

[49]  Pavel Paclík,et al.  Does SVM Really Scale Up to Large Bag of Words Feature Spaces? , 2007, IDA.

[50]  Judithe Sheard,et al.  Addressing student cheating: definitions and solutions , 2003, ACM SIGCSE Bull..

[51]  Stephen G. MacDonell,et al.  Software Metrics Data Analysis—Exploring the Relative Performance of Some Commonly Used Modeling Techniques , 1999, Empirical Software Engineering.

[52]  Andrew Turpin,et al.  Temporally Robust Software Features for Authorship Attribution , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[53]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[54]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[55]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[56]  Efstathios Stamatatos,et al.  Source Code Authorship Analysis For Supporting the Cybercrime Investigation Process , 2010, Handbook of Research on Computational Forensics, Digital Crime, and Investigation.

[57]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[58]  Andrew Turpin,et al.  Application of Information Retrieval Techniques for Source Code Authorship Attribution , 2009, DASFAA.

[59]  Peter Vamplew,et al.  An Anti-Plagiarism Editor for Software Development Courses , 2005, ACE.

[60]  Efstathios Stamatatos A survey of modern authorship attribution methods , 2009 .

[61]  Chris F. Kemerer,et al.  An Empirical Approach to Studying Software Evolution , 1999, IEEE Trans. Software Eng..

[62]  Efstathios Stamatatos Author Identification Using Imbalanced and Limited Training Texts , 2007 .

[63]  Fred G. Harold Experimental evaluation of program quality using external metrics , 1986 .

[64]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[65]  Efstathios Stamatatos,et al.  Webpage Genre Identification Using Variable-Length Character n-Grams , 2007 .

[66]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[67]  Andrei Z. Broder,et al.  Identifying and Filtering Near-Duplicate Documents , 2000, CPM.

[68]  James T. Neill,et al.  Who cheats at university? A self-report study of dishonest academic behaviours in a sample of Australian university students , 2005 .

[69]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[70]  John R. Anderson,et al.  Learning to Program in LISP , 1984, Cogn. Sci..

[71]  Hector Garcia-Molina,et al.  SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.

[72]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[73]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[74]  Eero Hyvönen,et al.  CEUR Workshop Proceedings , 2008 .

[75]  Christian S. Collberg,et al.  Self-plagiarism in computer science , 2005, CACM.

[76]  H. Altay Güvenir,et al.  Classification by Voting Feature Intervals , 1997, ECML.

[77]  Samuel L. Grier,et al.  A tool that detects plagiarism in Pascal programs , 1981, SIGCSE '81.

[78]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[79]  Fazli Can,et al.  Change of Writing Style with Time , 2004, Comput. Humanit..

[80]  Efstathios Stamatatos,et al.  Automatic Authorship Attribution , 1999, EACL.

[81]  Rong Zheng,et al.  Authorship Analysis in Cybercrime Investigation , 2003, ISI.

[82]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorisation: a survey , 1999 .

[83]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[84]  Michelle Craig,et al.  Plagiarism detection using feature-based neural networks , 2007, SIGCSE.

[85]  Justin Zobel Uni Cheats Racket: A Case Study in Plagiarism Investigation , 2004, ACE.

[86]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[87]  Bin Ma,et al.  Chain letters & evolutionary histories. , 2003, Scientific American.

[88]  Ying Zhao,et al.  Authorship Attribution Via Combination of Evidence , 2007, ECIR.

[89]  Erkki Sutinen,et al.  Fast Plagiarism Detection System , 2005, SPIRE.

[90]  Clark S. Lindsey,et al.  JavaTech, an Introduction to Scientific and Technical Computing with Java , 2005 .

[91]  Ahmad-Reza Sadeghi,et al.  Advanced techniques for dispute resolving and authorship proofs on digital works , 2003, IS&T/SPIE Electronic Imaging.

[92]  Stefanos Gritzalis,et al.  Supporting the cybercrime investigation process: Effective discrimination of source code authors based on byte-level information , 2005, ICETE.

[93]  Luis Gravano,et al.  dSCAM: finding document copies across multiple databases , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[94]  Sally S. Robinson,et al.  An instructional aid for student programs , 1980, SIGCSE '80.

[95]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[96]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[97]  E. P. Schan,et al.  Recommended C Style and Coding Standards , 1997 .

[98]  Eugene H. Spafford,et al.  The internet worm program: an analysis , 1989, CCRV.

[99]  Bernard De Baets,et al.  A Connectionist Fuzzy Case-Based Reasoning Model , 2006, MICAI.

[100]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[101]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[102]  Cristian Grozea,et al.  ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection ∗ , 2009 .

[103]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[104]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[105]  Thomas Lavergne Unnatural language detection , 2006, CORIA.

[106]  Benno Stein,et al.  Intrinsic Plagiarism Detection , 2006, ECIR.

[107]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[108]  John D. Burger,et al.  An Exploration of Observable Features Related to Blogger Age , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[109]  Spiros Mancoridis,et al.  On the Use of Discretized Source Code Metrics for Author Identification , 2009, 2009 1st International Symposium on Search Based Software Engineering.

[110]  Gilad Mishne,et al.  Source Code Retrieval using Conceptual Similarity , 2004, RIAO.

[111]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[112]  Roland H. Untch,et al.  A small and secure submission system for UNIX systems , 2005, ACM-SE 43.

[113]  Robert Parry,et al.  Third degree. , 1997, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[114]  Gregory A. Hall,et al.  Toward Defining the Intersection of Forensics and Information Technology , 2005, Int. J. Digit. EVid..

[115]  Diana Inkpen,et al.  Using the Complexity of the Distribution of Lexical Elements as a Feature in Authorship Attribution , 2008, LREC.

[116]  Glenn Gamst,et al.  Applied Multivariate Research: Design and Interpretation , 2005 .

[117]  Alan Nash,et al.  The Elements of C Programming Style , 1992 .

[118]  Arthur M. Lesk,et al.  Introduction to bioinformatics , 2002 .

[119]  Alberto Barrón-Cedeño,et al.  Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance , 2009, CICLing.

[120]  Maxim Mozgovoy Enhancing Computer-Aided Plagiarism Detection , 2008 .

[121]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[122]  M Hamilton,et al.  Educating students about plagiarism avoidance - A computer science perspective , 2004 .

[123]  Stefanos Gritzalis,et al.  Effective identification of source code authors using byte-level information , 2006, ICSE.

[124]  Ann-Marie Lancaster,et al.  A plagiarism detection system , 1981, SIGCSE '81.

[125]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[126]  Kenneth J. Stevens,et al.  The Introduction and Assessment of Three Teaching Tools (WebCT, Mindtrail, EVE) into a Post Graduate Course , 2002, J. Inf. Technol. Educ..

[127]  Seyed M. M. Tahaghoghi,et al.  Plagiarism detection across programming languages , 2006, ACSC.

[128]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[129]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[130]  Eugene H. Spafford,et al.  The internet worm: crisis and aftermath , 1989 .

[131]  Patrick Juola,et al.  A Controlled-corpus Experiment in Authorship Identification by Cross-entropy , 2003 .

[132]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[133]  Edsger W. Dijkstra,et al.  Go to Statement Considered Harmful (Reprint) , 2002, Software Pioneers.

[134]  Justin Zobel,et al.  Effective and Scalable Authorship Attribution Using Function Words , 2005, AIRS.

[135]  Stephen G. MacDonell,et al.  IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination): a dictionary-based system for extracting source code metrics for software forensics , 1998, Proceedings. 1998 International Conference Software Engineering: Education and Practice (Cat. No.98EX220).

[136]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[137]  Steven Garcia,et al.  RMIT University at TREC 2005: Terabyte and Robust Track , 2005, TREC.

[138]  Justin Zobel,et al.  Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[139]  Stephen G. MacDonell,et al.  Applications of fuzzy logic to software metric models for development effort estimation , 1997, 1997 Annual Meeting of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.97TH8297).

[140]  Sally Jo Cunningham,et al.  Applications of machine learning in information retrieval , 1999 .

[141]  Carl Eklund,et al.  National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.

[142]  Andrew Walenstein,et al.  Malware phylogeny generation using permutations of code , 2005, Journal in Computer Virology.

[143]  Elad Yom-Tov,et al.  Serial Sharers: Detecting Split Identities of Web Authors , 2007, PAN.

[144]  E. Eugene Schultz,et al.  Beyond preliminary analysis of the WANK and OILZ worms: a case study of malicious code , 1993, Comput. Secur..

[145]  Spiros Mancoridis,et al.  A Probabilistic Approach to Source Code Authorship Identification , 2007, Fourth International Conference on Information Technology (ITNG'07).

[146]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[147]  Chunju Tseng,et al.  The Arizona IDMatcher: developing an identity matching tool for law enforcement , 2007, DG.O.

[148]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009 .

[149]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[150]  Deborah G. Johnson,et al.  Australian Computer Society Code of Ethics Project (Part 1) , 2004 .

[151]  Ward E. Y. Elliott,et al.  And then there were none: Winnowing the Shakespeare claimants , 1996, Comput. Humanit..

[152]  Mehmet M. Dalkilic,et al.  Using Compression to Identify Classes of Inauthentic Texts , 2006, SDM.

[153]  Edward L. Jones METRICS BASED PLAGIARISM MONITORING , 2001 .

[154]  Thomas P. Way,et al.  SNITCH: a software tool for detecting cut and paste plagiarism , 2006, SIGCSE '06.

[155]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[156]  Jean-Marc Jézéquel,et al.  Design by Contract: The Lessons of Ariane , 1997, Computer.

[157]  Justin Zobel,et al.  Music Ranking Techniques Evaluated , 2000, ISMIR.

[158]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[159]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[160]  Stephen G. MacDonell,et al.  IDENTIFIED: software authorship analysis with case-based reasoning , 1998 .

[161]  Charlie Daly,et al.  A Technique for Detecting Plagiarism in Computer Code , 2005, Comput. J..

[162]  Stefanos Gritzalis,et al.  Examining the significance of high-level programming features in source code author classification , 2008, J. Syst. Softw..

[163]  Michael Gamon,et al.  Obfuscating Document Stylometry to Preserve Author Anonymity , 2006, ACL.

[164]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[165]  Stephen G. MacDonell,et al.  Software Forensics: Extending Authorship Analysis Techniques to Computer Programs , 2002 .

[166]  Cynthia A. Phillips,et al.  Constructing Computer Virus Phylogenies , 1996, J. Algorithms.

[167]  Curtis R. Cook,et al.  A paradigm for programming style research , 1988, SIGP.

[168]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[169]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[170]  George Fernandez,et al.  Weblearn : a common gateway interface ( CGI)-based enviroment for interactive learning , 2001 .