A Causal Classification of Orthography Errors in Web Texts

Errors, even at the spelling level, can provide useful insight into the nature of a written text. This paper presents a classification of spelling errors in Web texts based on their causes (misspellings, typos and intentional deviations), linking them to the attitudes of their authors and the circumstances of their writing. Examples are drawn from blog and forum entries in English and Italian.

[1]  K. Blashki,et al.  Game geek's goss: linguistic creativity in young males within an online university forum (94/\/\3 933k’5 9055oneone) , 2005 .

[2]  Amitai Etzioni ~, Routledge , 2004 .

[3]  Naomi S. Baron Letters by Phone or Speech by Other Means: The Linguistics of Email. , 1998 .

[4]  O. H. Lowry Academic press. , 1972, Analytical chemistry.

[5]  Marina Santini,et al.  Interpreting Genre Evolution on the Web , 2006 .

[6]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[7]  Marina Santini Identifying Genres of Web Pages , 2006 .

[8]  Elena Pistolesi Il parlar spedito. L’italiano di chat, e-mail e SMS. , 2004 .

[9]  Marina Santini Some Issues in Automatic Genre Classification of Web Pages , 2006 .

[10]  Linda Bebout,et al.  An error analysis of misspellings made by learners of English as a first and as a second language , 1985 .

[11]  Klaus U. Schulz,et al.  The same is not the same - postcorrection of alphabet confusion errors in mixed-alphabet OCR recognition , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[12]  Roy Harris,et al.  The Origin of Writing , 1986 .

[13]  J. Richard Gentry Developmental Spelling: Assessment. , 1982 .

[14]  Mirko Tavosanis,et al.  Are Blogs Edited? A Linguistic Survey of Italian Blogs Using Search Engines , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[15]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[16]  Roy Harris,et al.  Signs of writing , 1998 .

[17]  Richard Power,et al.  Implementing a Characterization of Genre for Automatic Genre Identification of Web Pages , 2006, ACL.

[18]  Marina Santini Common Criteria for Genre Classification : Annotation and Granularity , 2006 .

[19]  Elsie M. Szecsy Multicultural Education , 2009 .

[20]  Klaus U. Schulz,et al.  Orthographic Errors in Web Pages: Toward Cleaner Web Corpora , 2006, Computational Linguistics.

[21]  H. Gross Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand , 1983 .