CRITERION‐RELATED VALIDITY OF THE TOEFL IBT LISTENING SECTION

The study investigated the criterion-related validity of the Test of English as a Foreign Language™ Internet-based test (TOEFL® iBT) Listening section by examining its relationship to a criterion measure designed to reflect language-use tasks that university students encounter in everyday academic life: listening to academic lectures. The design of the criterion measure was informed by students' responses to a survey on the frequency and importance of various classroom tasks that require academic listening, and the relationship of these tasks to successful course completion. The criterion measure consisted of three videotaped lectures (in physics, history, and psychology) and included tasks created by content experts who are former university professors of the relevant content area. These tasks reflected what the content experts expected students to have comprehended during the lecture. The criterion measure and the TOEFL iBT Listening section were administered to nonnative speakers of English who were enrolled in undergraduate and graduate programs. Data from 221 participants were analyzed. Substantial correlations were observed between the criterion measure and the TOEFL iBT Listening section score for the entire sample and for subgroups (Pearson correlation coefficients ranging from .56 to .74 and disattenuated correlations ranging from .62 to .82). Moreover, the analysis of the mean scores on the criterion measure for different ability groups indicated that participants who scored at or above typical cut scores for international student admission to academic programs (i.e., TOEFL iBT Listening section score of 14 or above) scored, on average, nearly 50% or more on the criterion measure, demonstrating reasonable comprehension of the academic lectures.

[1]  R. Freedle,et al.  Does the text matter in a multiple-choice test of comprehension? the case for the construct validity of TOEFL's minitalks , 1999 .

[2]  Amiel T. Sharon English Proficiency, Verbal Aptitude, and Foreign Student Success in American Graduate Schools , 1972 .

[3]  Patricia Dunkel,et al.  The Effects of Notetaking, Lecture Length and Topic on the Listening Component of TOEFL 2000 , 2002 .

[4]  S. Ross Self-assessment in second language testing: a meta-analysis and analysis of experiential factors , 1998 .

[5]  Xiaohong Gao,et al.  Generalizability Analyses of Work Keys Listening and Writing Tests , 1995 .

[6]  Susy Macqueen,et al.  Validity , 1973, Just Algorithms.

[7]  Michael Rost,et al.  Academic Listening: On-line summaries as representations of lecture understanding , 1995 .

[8]  Sarah A. Hezlett,et al.  A comprehensive meta-analysis of the predictive validity of the graduate record examinations: implications for graduate student selection and performance. , 2001, Psychological bulletin.

[9]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[10]  James N. Davis,et al.  Academic Listening: The effects of rhetorical signaling cues on the recall of English lecture information by speakers of English as a native or second language , 1995 .

[11]  R. Linn Educational measurement, 3rd ed. , 1989 .

[12]  Brent Bridgeman,et al.  A STUDY OF WRITING TASKS ASSIGNED IN ACADEMIC DEGREE PROGRAMS , 1995 .

[13]  K. Tatsuoka,et al.  Application of the rule-space procedure to language testing: examining attributes of a free response listening test , 1998 .

[14]  Douglas Biber Variation across speech and writing: Situations and functions , 1988 .

[15]  Lyle F. Bachman,et al.  语言测试实践 = Language testing in practice , 1998 .

[16]  Yong-Won Lee,et al.  DEPENDABILITY OF NEW ESL WRITING TEST SCORES: EVALUATING PROTOTYPE TASKS AND ALTERNATIVE RATING SCHEMES , 2005 .

[17]  Richard J. Tannenbaum,et al.  Dependability of Scores for a New ESL Speaking Test : Evaluating Prototype Tasks , .

[18]  Abdullah A. Khuwaileh The role of chunks, phrases and body language in understanding co-ordinated academic lectures , 1999 .

[19]  Craig Chaudron,et al.  Academic Listening: Second language listening comprehension and lecture note-taking , 1995 .

[20]  J. Richards,et al.  The Effect of Discourse Markers on the Comprehension of Lectures. , 1986 .

[21]  Steve Tauroza,et al.  Academic Listening: Expectation-driven understanding in information systems lecture comprehension , 1995 .

[22]  Kathryn Hill,et al.  A comparison of IELTS and TOEFL as predictors of academic success , 1999 .

[23]  Dan Douglas,et al.  Assessing language for specific purposes , 2000 .

[24]  Susan Nissan,et al.  AN ANALYSIS OF FACTORS AFFECTING THE DIFFICULTY OF DIALOGUE ITEMS IN TOEFL LISTENING COMPREHENSION , 1995 .

[25]  Yasuyo Sawaki The generalizability of summarization and free recall ratings in L2 reading assessment , 2005 .

[26]  John Flowerdew,et al.  The Effect of Discourse Markers on Second Language Lecture Comprehension , 1995, Studies in Second Language Acquisition.

[27]  Alija Kulenović,et al.  Standards for Educational and Psychological Testing , 1999 .

[28]  Thomas H. Huckin,et al.  Point-driven understanding in engineering lecture comprehension , 1990 .

[29]  Tony Dudley-Evans Academic Listening: Variations in the discourse patterns favoured by different disciplines and their pedagogical implications , 1995 .

[30]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[31]  Christa Hansen,et al.  Academic Listening: Evaluating lecture comprehension , 1995 .

[32]  Janet G. Graham English Language Proficiency and the Prediction of Academic Success , 1987 .

[33]  April Ginther,et al.  A review of the academic needs of native English-speaking college students in the United States , 1996 .

[34]  Helen Slatyer,et al.  Exploring task difficulty in ESL listening assessment , 2002 .

[35]  John Flowerdew,et al.  Academic Listening: Research of relevance to second language lecture comprehension – an overview , 1995 .

[36]  Donald E. Powers,et al.  VALIDATING LANGUEDGE™ COURSEWARE SCORES AGAINST FACULTY RATINGS AND STUDENT SELF‐ASSESSMENTS , 2003 .

[37]  James R. Nattinger,et al.  Lexical phrases for the comprehension of academic lectures , 1988 .

[38]  Paula Garcia,et al.  Combining Multiple Regression and CART to Understand Difficulty in Second Language Reading and Listening Comprehension Test Items , 2001 .

[39]  Donald E. Powers Academic demands related to listening skills , 1986 .

[40]  W. W. Willingham,et al.  The Criterion Problem: What Measure of Success in Graduate Education? , 1980 .

[41]  Susan Conrad,et al.  Speaking and Writing in the University: A Multidimensional Comparison , 2002 .

[42]  Xiaoming Xi,et al.  INVESTIGATING THE UTILITY OF ANALYTIC SCORING FOR THE TOEFL ACADEMIC SPEAKING TEST (TAST) , 2006 .

[43]  Patricia L. Carrell,et al.  NOTETAKING STRATEGIES AND THEIR RELATIONSHIP TO PERFORMANCE ON LISTENING COMPREHENSION AND COMMUNICATIVE ASSESSMENT TASKS , 2007 .

[44]  G. Buck Assessing Listening , 2001 .

[45]  Donald E. Powers A SURVEY OF ACADEMIC DEMANDS RELATED TO LISTENING SKILLS , 1985 .

[46]  Les Firbank,et al.  Intermediate Statistics: A Modern Approach , 1992 .

[47]  Elvis Wagner,et al.  Are they watching? Test-taker viewing behavior during an L2 video listening test , 2007 .

[48]  Paul Gruba,et al.  The role of video media in listening assessment , 1997 .

[49]  J. Richards Listening Comprehension: Approach, Design, Procedure , 1983 .