Procura-PALavras (P-PAL): A Web-based interface for a new European Portuguese lexical database

In this article, we present Procura-PALavras (P-PAL), a Web-based interface for a new European Portuguese (EP) lexical database. Based on a contemporary printed corpus of over 227 million words, P-PAL provides a broad range of word attributes and statistics, including several measures of word frequency (e.g., raw counts, per-million word frequency, logarithmic Zipf scale), morpho-syntactic information (e.g., parts of speech [PoSs], grammatical gender and number, dominant PoS, and frequency and relative frequency of the dominant PoS), as well as several lexical and sublexical orthographic (e.g., number of letters; consonant–vowel orthographic structure; density and frequency of orthographic neighbors; orthographic Levenshtein distance; orthographic uniqueness point; orthographic syllabification; and trigram, bigram, and letter type and token frequencies), and phonological measures (e.g., pronunciation, number of phonemes, stress, density and frequency of phonological neighbors, transposed and phonographic neighbors, syllabification, and biphone and phone type and token frequencies) for ~53,000 lemmatized and ~208,000 nonlemmatized EP word forms. To obtain these metrics, researchers can choose between two word queries in the application: (i) analyze words previously selected for specific attributes and/or lexical and sublexical characteristics, or (ii) generate word lists that meet word requirements defined by the user in the menu of analyses. For the measures it provides and the flexibility it allows, P-PAL will be a key resource to support research in all cognitive areas that use EP verbal stimuli. P-PAL is freely available at http://p-pal.di.uminho.pt/tools.

[1]  Peter Grzybek,et al.  HISTORY AND METHODOLOGY OF WORD LENGTH STUDIES The State of the Art , 2007 .

[2]  Prisca Stenneken,et al.  Sublexical frequency measures for orthographic and phonological units in German , 2007, Behavior research methods.

[3]  J. Ziegler,et al.  Pseudohomophone effects and phonological recoding procedures in reading development in English and German , 2001 .

[4]  A. Jacobs,et al.  The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German. , 2011, Experimental psychology.

[5]  D. Zagar,et al.  The neighborhood distribution effect in visual word recognition: words with single and twin neighbors. , 2000, Journal of experimental psychology. Human perception and performance.

[6]  Paul Gelderloos,et al.  The transcendental meditation and TM-Sidhi program and reported experiences of transcendental consciousness , 1989 .

[7]  Montserrat Comesaña,et al.  Contextual diversity is a main determinant of word identification times in young readers. , 2013, Journal of experimental child psychology.

[8]  Christopher J. Hand,et al.  Word-Initial Letters Influence Fixation Durations during Fluent Reading , 2012, Front. Psychology.

[9]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.

[10]  R. Baayen,et al.  Morphological influences on the recognition of monosyllabic monomorphemic words , 2006 .

[11]  William D. Marslen-Wilson,et al.  Aralex: A lexical database for Modern Standard Arabic , 2010, Behavior research methods.

[12]  Manuel Perea,et al.  The Quarterly Journal of Experimental Psychology on the Advantages of Word Frequency and Contextual Diversity Measures Extracted from Subtitles: the Case of Portuguese , 2022 .

[13]  Antonios Kyparissiadis,et al.  GreekLex 2: A comprehensive lexical database with part-of-speech, syllabic, phonological, and stress information , 2017, PloS one.

[14]  Fabienne Chetail,et al.  InfoSyll: A Syllabary Providing Statistical Information on Phonological and Orthographic Syllables , 2010, Journal of psycholinguistic research.

[15]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[16]  K I Forster,et al.  The potential for experimenter bias effects in word recognition experiments , 2000, Memory & cognition.

[17]  Gordon D A Brown,et al.  Phonographic neighbors, not orthographic neighbors, determine word naming latencies , 2007, Psychonomic Bulletin & Review.

[18]  C. Davis,et al.  BuscaPalabras: A program for deriving orthographic and phonological neighborhood statistics and other psycholinguistic indices in Spanish , 2005, Behavior research methods.

[19]  José João Almeida,et al.  Procura-PALavras (P-Pal): uma nova medida de frequência lexical do português europeu contemporâneo , 2014 .

[20]  Marc Brysbaert,et al.  SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles , 2010, Behavior research methods.

[21]  M. Brysbaert,et al.  The use of film subtitles to estimate word frequencies , 2007, Applied Psycholinguistics.

[22]  Peter Grzybek,et al.  History and Methodology of Word Length Studies , 2007 .

[23]  L. Katz,et al.  Strategies for visual word recognition and orthographical depth: a multilingual comparison. , 1987 .

[24]  D. Pisoni,et al.  Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[25]  C. Davis N-Watch: A program for deriving neighborhood size and other psycholinguistic statistics , 2005, Behavior research methods.

[26]  Boris New,et al.  Diphones-fr: A French database of diphone positional frequency , 2013, Behavior research methods.

[27]  Manuel Perea,et al.  Re(de)fining the orthographic neighborhood: the role of addition and deletion neighbors in lexical decision and reading. , 2009, Journal of experimental psychology. Human perception and performance.

[28]  Montserrat Comesaña,et al.  Disentangling the effects of word frequency and contextual diversity on serial recall performance , 2017, Quarterly journal of experimental psychology.

[29]  David A. Balota,et al.  Visual Word Recognition: The Journey from Features to Meaning (A Travel Update) , 2006 .

[30]  Manuel Carreiras,et al.  E-Hitz: A word frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (Basque) , 2006, Behavior research methods.

[31]  R. Baayen,et al.  Singulars and plurals in Dutch: Evidence for a parallel dual-route model , 1997 .

[32]  E. Thorndike The Teacher's Word Book , 2007 .

[33]  Max Coltheart,et al.  Access to the internal lexicon , 1977 .

[34]  J. Grainger,et al.  On letter frequency effects. , 2011, Acta psychologica.

[35]  José João Almeida,et al.  jspell.pm: um módulo de análise morfológica para uso em processamento de linguagem natural , 2001 .

[36]  Ronald Peereman,et al.  Orthographic and Phonological Neighborhoods in Naming: Not All Neighbors Are Equally Influential in Orthographic Space , 1997 .

[37]  R. Goebel,et al.  Local Discriminability Determines the Strength of Holistic Processing for Faces in the Fusiform Face Area , 2013, Front. Psychology.

[38]  Ana Paula Soares,et al.  The role of syllables in intermediate-depth stress-timed languages: masked priming evidence in European Portuguese , 2018 .

[39]  N. F. Johnson,et al.  A Cohort Model of Visual Word Recognition , 1994, Cognitive Psychology.

[40]  Nicola J. Pitchford,et al.  GreekLex: A lexical database of Modern Greek , 2008, Behavior research methods.

[41]  Manuel Perea,et al.  EsPal: One-stop shopping for Spanish word properties , 2013, Behavior Research Methods.

[42]  Ehab W. Hermena,et al.  Parafoveal processing of Arabic diacritical marks. , 2016, Journal of experimental psychology. Human perception and performance.

[43]  Luísa Pereira,et al.  Portuguese Corpora at CLUL , 2000, LREC.

[44]  Keith Rayner,et al.  The orthographic uniqueness point and eye movements during reading. , 2006, British journal of psychology.

[45]  Gordon D. A. Brown,et al.  Contextual Diversity, Not Word Frequency, Determines Word-Naming and Lexical Decision Times , 2006, Psychological science.

[46]  D. Balota,et al.  Moving beyond Coltheart’s N: A new measure of orthographic similarity , 2008, Psychonomic bulletin & review.

[47]  R. Harald Baayen,et al.  Corpus linguistics and naive discriminative learning , 2011 .

[48]  M. Brysbaert,et al.  Adding part-of-speech information to the SUBTLEX-US word frequencies , 2012, Behavior Research Methods.

[49]  M. Brysbaert,et al.  Dealing with zero word frequencies: A review of the existing rules of thumb and a suggestion for an evidence-based choice , 2012, Behavior Research Methods.

[50]  São Luís Castro,et al.  Porlex, a lexical database in european portuguese , 2003 .

[51]  L. Katz,et al.  Strategies for visual word recognition and orthographical depth: a multilingual comparison. , 1987, Journal of experimental psychology. Human perception and performance.

[52]  Marc Brysbaert,et al.  Lexique 2 : A new French lexical database , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[53]  Max Coltheart,et al.  The MRC Psycholinguistic Database , 1981 .

[54]  Johanna-Pascale Roy,et al.  SyllabO+: A new tool to study sublexical phenomena in spoken Quebec French , 2017, Behavior research methods.

[55]  Manuel Perea,et al.  SYLLABARIUM: An online application for deriving complete statistics for Basque and Spanish orthographic syllables , 2010, Behavior research methods.

[56]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[57]  Marc Brysbaert,et al.  Subtlex-UK: A New and Improved Word Frequency Database for British English , 2014, Quarterly journal of experimental psychology.

[58]  David A. Balota,et al.  Visual Word Recognition , 2015, Linguistics.

[59]  Alexander Geyken,et al.  dlexDB : eine lexikalische Datenbank für die psychologische und linguistische Forschung , 2011 .

[60]  H. Breland Word Frequency and Word Difficulty: A Comparison of Counts in Four Corpora , 1996 .

[61]  Jonathan Grainger,et al.  A Dual-Route Approach to Orthographic Processing , 2011, Front. Psychology.

[62]  D. Mewhort,et al.  Evidence for sequential processing in visual word recognition. , 1999 .