From Novice to Expert: Analysis of Token Level Effects in a Longitudinal Eye Tracking Study

Program comprehension is a vital skill in software development. This work investigates program comprehension by examining the eye movement of novice programmers as they gain programming experience over the duration of a Java course. Their eye movement behavior is compared to the eye movement of expert programmers. Eye movement studies of natural text show that word frequency and length influence eye movement duration and act as indicators of reading skill. The study uses an existing longitudinal eye tracking dataset with 20 novice and experienced readers of source code. The work investigates the acquisition of the effects of token frequency and token length in source code reading as an indication of program reading skill. The results show evidence of the frequency and length effects in reading source code and the acquisition of these effects by novices. These results are then leveraged in a machine learning model demonstrating how eye movement can be used to estimate programming proficiency and classify novices from experts with 72% accuracy.

[1]  Gary E. Raney,et al.  Eye movement control in reading and visual search: Effects of word frequency , 1996, Psychonomic bulletin & review.

[2]  Thomas A. Standish An Essay on Software Reuse , 1984, IEEE Transactions on Software Engineering.

[3]  Roman Bednarik,et al.  What influences dwell time during source code reading?: analysis of element type and frequency as factors , 2014, ETRA.

[4]  K. Rayner,et al.  Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity , 1986, Memory & cognition.

[5]  Wayne S. Murray,et al.  THE COMPONENTS OF READING TIME: EYE MOVEMENT PATTERNS OF GOOD AND POOR READERS , 1987 .

[6]  Meiyappan Nagappan,et al.  Curating GitHub for engineered software projects , 2017, Empirical Software Engineering.

[7]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[8]  Andreas Busjahn,et al.  Analysis of code reading to gain more insight in program comprehension , 2011, Koli Calling.

[9]  Jana Schumann,et al.  Confounding parameters on program comprehension: a literature survey , 2015, Empirical Software Engineering.

[10]  D. Balota,et al.  Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. , 1984, Journal of experimental psychology. Human perception and performance.

[11]  K. Rayner,et al.  Effects of contextual predictability and transitional probability on eye movements during reading. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[12]  Martha E. Crosby,et al.  How do we read algorithms? A case study , 1990, Computer.

[13]  Javed I. Khan,et al.  Modeling Part-of-Speech and Semantic Significance Effects on Semantic Construction During Reading , 2019, 2019 IEEE 13th International Conference on Semantic Computing (ICSC).

[14]  Gary E. Raney,et al.  Word frequency effects and eye movements during two readings of a text. , 1995, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[15]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[16]  Jean Scholtz,et al.  The Roles Beacons Play in Comprehension for Novice and Expert Programmers , 2002, PPIG.

[17]  Ralf Engbert,et al.  Length, frequency, and predictability effects of words on eye movements in reading , 2004 .

[18]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[19]  Andrew Begel,et al.  Cognitive Perspectives on the Role of Naming in Computer Programs , 2006, PPIG.

[20]  Anneliese Amschler Andrews,et al.  From program comprehension to tool requirements for an industrial environment , 1993, [1993] IEEE Second Workshop on Program Comprehension.

[21]  Andrew Begel,et al.  Using psycho-physiological measures to assess task difficulty in software development , 2014, ICSE.

[22]  Roman Bednarik,et al.  Expertise-dependent visual attention strategies develop over time during debugging with multiple code representations , 2012, Int. J. Hum. Comput. Stud..

[23]  Erik D. Reichle,et al.  The E-Z Reader model of eye-movement control in reading: Comparisons to other models , 2003, Behavioral and Brain Sciences.

[24]  Keith Rayner,et al.  Eye Movements of Highly Skilled and Average Readers: Differential Effects of Frequency and Predictability , 2005, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[25]  Mik Kersten,et al.  How are Java software developers using the Elipse IDE? , 2006, IEEE Software.

[26]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[27]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[28]  Peter C.-H. Cheng,et al.  A Survey on the Usage of Eye-Tracking in Computer Programming , 2018, ACM Comput. Surv..

[29]  Anneliese Amschler Andrews,et al.  Program Comprehension During Software Maintenance and Evolution , 1995, Computer.

[30]  Andrew Begel,et al.  Eye Movements in Code Reading: Relaxing the Linear Order , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[31]  Rainer Koschke,et al.  On the Comprehension of Program Comprehension , 2014, TSEM.

[32]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[33]  Ben Shneiderman,et al.  Syntactic/semantic interactions in programmer behavior: A model and experimental results , 1979, International Journal of Computer & Information Sciences.

[34]  Erik D. Reichle,et al.  The effect of word frequency, word predictability, and font difficulty on the eye movements of young and older readers. , 2006, Psychology and aging.

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Tony Clear,et al.  An introduction to program comprehension for computer science educators , 2010, ITiCSE-WGR '10.

[37]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[38]  Adele E. Howe,et al.  Program understanding behaviour during enhancement of large-scale software , 1997, J. Softw. Maintenance Res. Pract..

[39]  Robert DeLine,et al.  Information Needs in Collocated Software Development Teams , 2007, 29th International Conference on Software Engineering (ICSE'07).

[40]  Yann-Gaël Guéhéneuc,et al.  A systematic literature review on the usage of eye-tracking in software engineering , 2015, Inf. Softw. Technol..