From Character to Word Level: Enabling the Linguistic Analyses of Inputlog Process Data

Keystroke-logging tools are widely used in writing process research. These applications are designed to capture each character and mouse movement as isolated events as an indicator of cognitive processes. The current research project explores the possibilities of aggregating the logged process data from the letter level (keystroke) to the word level by merging them with existing lexica and using NLP tools. Linking writing process data to lexica and using NLP tools enables researchers to analyze the data on a higher, more complex level. In this project the output data of Inputlog are segmented on the sentence level and then tokenized. However, by definition writing process data do not always represent clean and grammatical text. Coping with this problem was one of the main challenges in the current project. Therefore, a parser has been developed that extracts three types of data from the S-notation: word-level revisions, deleted fragments, and the final writing product. The within-word typing errors are identified and excluded from further analyses. At this stage the Inputlog process data are enriched with the following linguistic information: part-of-speech tags, lemmas, chunks, syllable boundaries and word frequencies.

[1]  Andrew R. Golding,et al.  A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[2]  Walter Daelemans,et al.  Part of Speech Tagging and Lemmatisation for the Spoken Dutch Corpus , 2000, LREC.

[3]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[4]  Antoni Oliver,et al.  A Grammar and Style Checker Based on Internet Searches , 2004, LREC.

[5]  Kathleen F. McCoy,et al.  Recognizing Syntactic Errors in the Writing of Second Language Learners , 1998, ACL.

[6]  Elliot Soloway,et al.  StoryTime: a new way for children to write , 2009, IDC.

[7]  Andy Adler,et al.  Evaluating and implementing a collaborative office document system , 2006, Interact. Comput..

[8]  Pascal Béguin,et al.  Design as a mutual learning process between users and designers , 2003, Interact. Comput..

[9]  Janne Bondi Johannessen,et al.  The Performance of a Grammar Checker with Deviant Language Input , 2002, COLING.

[10]  La rédactologie : domaine, méthode et compétences , 2002 .

[11]  Jianfeng Gao,et al.  A Web-based English Proofing System for English as a Second Language Users , 2008, IJCNLP.

[12]  Martin Chodorow,et al.  The EPISTLE Text-Critiquing System , 1982, IBM Syst. J..

[13]  Rüdiger Weingarten,et al.  Written production of German compounds : Effects of lexical frequency and semantic transparency , 2008 .

[14]  A. Berztiss,et al.  Requirements Engineering , 2002, J. Object Technol..

[15]  Ben Hutchinson,et al.  Using the Web for Language Independent Spellchecking and Autocorrection , 2009, EMNLP.

[16]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[17]  W. Kintsch The role of knowledge in discourse comprehension: a construction-integration model. , 1988, Psychological review.

[18]  Victoria Johansson,et al.  What Keystroke Logging can Reveal about Writing , 2006, Computer Key-Stroke Logging and Writing.

[19]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[20]  Victoria Johansson,et al.  Combined eyetracking and keystroke-logging methods for studying cognitive processes in text production , 2009, Behavior research methods.

[21]  Jonas Sjöbergh The Internet as a Normative Corpus : Grammar Checking with a Search Engine , 2006 .

[22]  Eric Atwell,et al.  How to Detect Grammatical Errors in a Text Without Parsing It , 1987, EACL.

[23]  Claudia Leacock,et al.  Automated Grammatical Error Correction for Language Learners , 2010, COLING.

[24]  Kerstin Severinson Eklundh,et al.  Studying Writers’ Revising Patterns with S-Notation Analysis , 2002 .

[25]  Robert Dale,et al.  Towards Robust PATR , 1992, COLING.

[26]  Rogelio Nazar Algorithm qualifies for C 1 courses in German exam without previous knowledge of the language : An example of how corpus linguistics can be a new paradigm in Artificial Intelligence , 2012 .

[27]  Robert Dale,et al.  Proceedings of the EACL 2012 Workshop on Computational Linguistics and Writing: Linguistic and cognitive aspects of document creation and document engineering (CL&W 2012) , 2012, DocEng 2012.

[28]  Michael Gamon,et al.  Practical Experience with Grammar Sharing in Multilingual NLP , 1997 .

[29]  Philip Bolt,et al.  AN EVALUATION OF GRAMMAR‐CHECKING PROGRAMS AS SELF‐HELP LEARNING AIDS FOR LEARNERS OF ENGLISH AS A FOREIGN LANGUAGE , 1992 .

[30]  Luuk Van Waes,et al.  Inputlog: New Perspectives on the Logging of On-Line Writing Processes in a Windows Environment , 2006, Computer Key-Stroke Logging and Writing.

[31]  John Milton,et al.  A Toolkit to Assist L2 Learners Become Independent Writers , 2010, HLT-NAACL 2010.

[32]  Lisa C. Braden-Harder,et al.  The Experience Of Developing A Large-Scale Natural Language Text Procfassing System: CRITIQUE , 1988, ANLP.

[33]  Eva Lindgren,et al.  Computer keystroke logging and writing: methods and applications , 2006 .

[34]  Andrew Gahan Adapting a Concept , 2011 .

[35]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[36]  Pierre Falzon,et al.  The development of collective reliability: a study of therapeutic decision-making , 2008 .

[37]  J. Grabowski The internal structure of university student's keyboard skills , 2008 .

[38]  Y. Benkler,et al.  The Wealth of Networks , 2008 .

[39]  N. A-R A E H A N,et al.  Detecting errors in English article usage by non-native speakers , 2006 .

[40]  Manfred Stede,et al.  Customizing RST for the Automatic Production of Technical Manuals , 1992, NLG.

[41]  H. William Buttelmann,et al.  American Journal of Computational Linguistics , 1974 .

[42]  Peter Elbow Do we need a single standard of value for institutional assessment? An essay response to Asao Inoue's “community-based assessment pedagogy” , 2006 .

[43]  Lucy Vanderwende,et al.  Automatically Deriving Structured Knowledge Bases From On-Line Dictionaries , 1993 .

[44]  Cristóbal Lozano CEDEL2: Corpus Escrito del Español L2 , 2009 .

[45]  Patrick Saint-Dizier,et al.  LELIE: A Tool Dedicated to Procedure and Requirement Authoring , 2012, LREC.

[46]  S. Thompson,et al.  Discourse description : diverse linguistic analyses of a fund-raising text , 1992 .

[47]  Daniel Marcu The rhetorical parsing of natural language texts , 1997 .

[48]  Yves Chiaramella,et al.  La recherche d'information , 2007 .

[49]  Virginia W. Berninger,et al.  Past, Present, and Future Contributions of Cognitive Writing Research to Cognitive Psychology , 2012 .

[50]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[51]  G. Nottbusch Grammatical planning, execution, and control in written sentence production , 2010 .

[52]  Päivi Majaranta,et al.  Twenty years of eye typing: systems and design issues , 2002, ETRA.

[53]  Patrick Saint-Dizier,et al.  A Repository of Rules and Lexical Resources for Discourse Structure Analysis: the Case of Explanation Structures , 2012, LREC.

[54]  Gary F. Kohut,et al.  The Effectiveness of Leading Grammar/Style Software Packages in Analyzing Business Students' Writing , 1995 .

[55]  Dan Roth,et al.  Applying Winnow to Context-Sensitive Spelling Correction , 1996, ICML.

[56]  Jeffrey Allen Adapting the Concept of "Translation Memory" to "Authoring Memory" for a Controlled Language Writing Environment , 1999 .

[57]  Miroslaw Truszczynski,et al.  Answer set programming at a glance , 2011, Commun. ACM.

[58]  P. Falzon,et al.  Auto- and allo-confrontation as tools for reflective activities. , 2004, Applied ergonomics.