Word frequency of written Urdu

Performance on word processing tasks is known to be influenced by the frequency with which words occur in a language. Large and robust effects of word frequency occur across languages and the processes thought to be sensitive to word frequency are considered fundamentally important characteristics of the mental lexicon. To our knowledge, word frequency data is non-existent for Urdu. This important language has characteristics that make it appealing to psycholinguists. Unfortunately, most of the Urdu published electronically is in the form of image files rather than text and therefore, has been largely inaccessible by programs designed to generate word counts. Consequently, unlike other important orthographies (e.g., English) orthographic word frequencies in Urdu are not readily available. We describe here a database that addresses this methodological gap. We have constructed a word frequency database for written Urdu and describe that development. We also describe data from simple tests of the effects of Urdu word frequency to demonstrate that our measure results in effects considered to be the hallmark of frequency effects. The frequency counts from this database will help psycholinguists and cognitive psychologists conduct and control future studies on the mental lexicon using Urdu. This database can be downloaded from http://web2.uwindsor.ca/psychology/urdufrequency/