Practical Language Testing

1. Testing and assessment in context 2. Standardised testing 3. Classroom assessment 4. Deciding what to test 5. Designing test specifications 6. Evaluating, prototyping and piloting 7. Scoring language tests 8. Aligning tests to standards 9. Test administration 10. Testing and teaching Epilogue Glossary

[1]  Carsten Roever,et al.  Language testing: The social dimension , 2006 .

[2]  P. Rea-Dickins Classroom-based language assessment , 2008 .

[3]  John B. Carroll,et al.  THE FOREIGN LANGUAGE ATTAINMENTS OF LANGUAGE MAJORS IN THE SENIOR YEAR--A SURVEY CONDUCTED IN U.S. COLLEGES AND UNIVERSITIES. , 1967 .

[4]  J. Colpaert,et al.  Competence and performance in terms of content validity in productive language testing , 1997 .

[5]  Alan Davies,et al.  The Native Speaker: Myth and Reality , 2003 .

[6]  C. Leung,et al.  Teacher assessment and psychometric theory: a case of paradigm crossing? , 2000 .

[7]  E. Watters Literacy for Health: An Interdisciplinary Model , 2003, Journal of transcultural nursing : official journal of the Transcultural Nursing Society.

[8]  B. Chiswick,et al.  Immigrants' Language Skills and Visa Category 1 , 2002 .

[9]  A. Luria,et al.  The Making of Mind: A Personal Account of Soviet Psychology , 1979 .

[10]  Glenn Fulcher,et al.  The Common European Framework of Reference (CEFR) and the design of language tests: A matter of effect , 2007, Language Teaching.

[11]  G. Buck Assessing Listening , 2001 .

[12]  Lorrie A. Shepard,et al.  Inflated Test Score Gains: Is the Problem Old Norms or Teaching the Test? , 1990 .

[13]  Robert Glaser,et al.  Instructional technology and the measurement of learing outcomes: Some questions. , 1963 .

[14]  Donald E. Powers Coaching for the SAT: A Summary of the Summaries and an Update , 2005 .

[15]  Derrick Higgins,et al.  EVALUATING THE CONSTRUCT‐COVERAGE OF THE E‐RATER® SCORING ENGINE , 2009 .

[16]  CAN A TEST BE TOO RELIABLE , 1985 .

[17]  G. Fulcher,et al.  I Didn't Get the Grade I Need. Where's My Solicitor?. , 1996 .

[18]  Gan Zhengdong,et al.  IELTS Preparation Course and Student IELTS Performance A Case Study in Hong Kong , 2009 .

[19]  T. Haladyna Developing and Validating Multiple-Choice Test Items , 1994 .

[20]  Donald E. Powers,et al.  Influence of Irrelevant Speech on Standardized Test Performance , 2002 .

[21]  Carol A. Chapelle,et al.  Building a validity argument for the test of english as a foreign language , 2011 .

[22]  Fred Genesee,et al.  Classroom-Based Evaluation in Second Language Education , 1996 .

[23]  James Dean Brown,et al.  Criterion-Referenced Language Testing , 2002 .

[24]  N. Verhelst,et al.  Relating language examinations to the common European framework of reference for languages: learning, teaching, assessment (CEFR): a manual , 2009 .

[25]  Sara Cushing Weigle,et al.  Assessing Writing: Series Editors' Preface , 2002 .

[26]  W. V. Kaulfers Wartime Development in Modern‐Language Achievement Testing , 1944 .

[27]  María de la O López Abeledo Sociocultural Theory and the Genesis of Second Language Development , 2008 .

[28]  Dwayne D. Gremler,et al.  Customer-Employee Rapport in Service Relationships , 2000 .

[29]  Michael Halliday,et al.  RELEVANT MODELS OF LANGUAGE , 1969 .

[30]  David Coniam Investigating the quality of teacher-produced tests for EFL students and the effects of training in test development principles and practices on improving test quality , 2009 .

[31]  B. Plake Standard Setters: Stand Up and Take a Stand! , 2008 .

[32]  Sara Dexter,et al.  Students' Experiences with an Automated Essay Scorer. , 2008 .

[33]  J. Lantolf Dynamic assessment: The dialectic integration of instruction and assessment , 2009, Language Teaching.

[34]  A. Chalmers,et al.  Science And Its Fabrication , 1990 .

[35]  James P. Lantolf,et al.  Sociocultural theory and the teaching of second languages , 2008 .

[36]  K. Pearson NOTES ON THE HISTORY OF CORRELATION , 1920 .

[37]  David B. Pisoni,et al.  Two Experiments on Automatic Scoring of Spoken Language Proficiency , 2000 .

[38]  Lyle F. Bachman Statistical analyses for language assessment , 2004 .

[39]  Carol A. Chapelle,et al.  Assessing Language through Computer Technology: The technology thread , 2006 .

[40]  April Ginther,et al.  Context and content visuals and performance on listening comprehension stimuli , 2002 .

[41]  Glenn Fulcher,et al.  Widdowson's Model of Communicative Competence and the Testing of Reading: An Exploratory Study. , 1998 .

[42]  J. Charles Alderson,et al.  Examining washback: the Sri Lankan Impact Study , 1993 .

[43]  Dennis Leech,et al.  Is comprehensive education really free?: a case‐study of the effects of secondary school admissions policies on house prices in one local area , 2003 .

[44]  L. Taylor DEVELOPING ASSESSMENT LITERACY , 2009, Annual Review of Applied Linguistics.

[45]  R. Hambleton The Rise and Fall of Criterion Referenced Measurement , 2005 .

[46]  G. M. Ruch The improvement of the written examination , 2022 .

[47]  北市 陽一,et al.  Language Testing , 1964, Language Teaching.

[48]  Faisel Yunus,et al.  Statistics Using SPSS: An Integrative Approach, second edition , 2010 .

[49]  P. E. Vernon,et al.  The Fight for Our National Intelligence , 1937, Mental Welfare.

[50]  H. Widdowson,et al.  Teaching Language as Communication , 1979 .

[51]  James P. Lantolf,et al.  Sociocultural theory and second language acquisition , 2009, Language Teaching.

[52]  Glenn Fulcher,et al.  Variable competence in second language acquisition: A problem for research methodology? , 1995 .

[53]  Judith E. Liskin–Gasparro The ACTFL Proficiency Guidelines and the Oral Proficiency Interview: A Brief History and Analysis of Their Survival , 2003 .

[54]  Howard B. Lee,et al.  Foundations of Behavioral Research , 1973 .

[55]  Laura S. Hamilton,et al.  Testing for Accountability in K-12 , 2006 .

[56]  Glenn Fulcher,et al.  Effective rating scale development for speaking tests: Performance decision trees , 2011 .

[57]  Fred Davidson,et al.  Language Testing and Assessment: An Advanced Resource Book , 2007 .

[58]  A. Davies Textbook trends in teaching language testing , 2008 .

[59]  Brian North,et al.  The CEFR Illustrative Descriptor Scales , 2007 .

[60]  William B. Borgers DEMOCRACY AND EDUCATION. , 1919 .

[61]  James E. Purpura,et al.  Assessing Grammar , 2004 .

[62]  Dylan Wiliam,et al.  Meanings and Consequences in Standard Setting , 1996 .

[63]  Catherine Elder,et al.  Investigating the relationship between intensive English language study and band score gain on IELTS , 2003 .

[64]  Alister Cumming,et al.  Analysis of Discourse Features and Verification of Scoring Levels for Independent and Integrated Prototype Written Tasks for the New TOEFL®. TOEFL® Monograph Series. MS-30. ETS RM-05-13. , 2005 .

[65]  E. Shohamy Democratic assessment as an alternative , 2001 .

[66]  P. Black,et al.  Working inside the Black Box: Assessment for Learning in the Classroom , 2004 .

[67]  F. Y. Edgeworth I.—The Statistics of Examinations , 1888 .

[68]  JoAnne Brown,et al.  The Definition of a Profession: The Authority of Metaphor in the History of Intelligence Testing, 1890-1930 , 1994 .

[69]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[70]  Xiaoming Xi,et al.  Automated Scoring of Spontaneous Speech Using SpeechRater? v1.0. Research Report. ETS RR-08-62. , 2008 .

[71]  G. Brindley Outcomes-based assessment in practice: some examples and emerging insights , 2001 .

[72]  D. A. M.Sc Taylor Introduction to marine engineering , 1983 .

[73]  Viviane M. J. Robinson,et al.  Teacher talk , 2010 .

[74]  James M. Royer,et al.  Testing Accommodations for Examinees With Disabilities: A Review of Psychometric, Legal, and Social Policy Issues , 2001 .

[75]  Leonard Darwin,et al.  The Life, Letters and Labours of Francis Galton , 1925, Nature.

[76]  N. Schmitt,et al.  Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test , 2001 .

[77]  P. Black,et al.  Inside the Black Box: Raising Standards through Classroom Assessment , 2010 .

[78]  S A Stansfeld,et al.  Multilevel modelling of aircraft noise on performance tests in schools around Heathrow Airport London , 2002, Journal of epidemiology and community health.

[79]  Glenn Fulcher,et al.  Testing Second Language Speaking , 2003 .

[80]  Tim McNamara,et al.  Language assessment as social practice: challenges for research , 2001 .

[81]  Margo H. Gottlieb,et al.  Assessing English Language Learners: Bridges From Language Proficiency to Academic Achievement , 2006 .

[82]  R. D'amico Discipline and Punish: The Birth of the Prison , 1978, Telos.

[83]  Lyle F. Bachman Language Testing-SLA Research Interfaces , 1988, Annual Review of Applied Linguistics.

[84]  Cyril J. Weir,et al.  Language Testing and Validation , 2005 .

[85]  N. Underhill Testing Spoken Language: A Handbook of Oral Testing Techniques , 1987 .

[86]  Charles W. Jones An Early Medieval Licensing Examination , 1963, History of Education Quarterly.

[87]  M. Lamb Integrative Motivation in a Globalizing World. , 2004 .

[88]  Yasuyo Sawaki,et al.  Factor structure of the TOEFL Internet-based test , 2009 .

[89]  Steven J. Ross,et al.  The Discourse of Accommodation in Oral Proficiency Interviews , 1992, Studies in Second Language Acquisition.

[90]  C. Fox,et al.  Applying the Rasch Model: Fundamental Measurement in the Human Sciences , 2001 .

[91]  R. Young,et al.  Language and Interaction , 2008 .

[92]  G. Fulcher TEST USE AND POLITICAL PHILOSOPHY , 2009, Annual Review of Applied Linguistics.

[93]  B. Evans,et al.  IQ and Mental Testing: An Unnatural Science and its Social History , 1981 .

[94]  H. Birx,et al.  The Mismeasure of Man , 1981 .

[95]  Samuel Messick Validity and washback in language testing , 1996 .

[96]  C. Leung,et al.  FORMATIVE ASSESSMENT IN LANGUAGE EDUCATION POLICIES: EMERGING LESSONS FROM WALES AND SCOTLAND , 2009, Annual Review of Applied Linguistics.

[97]  L. Qi Has a High-Stakes Test Produced the Intended Changes? , 2004 .

[98]  Bernard Semmel Imperialism and Social Reform: English Social-Imperial Thought, 1895-1914 , 1960 .

[99]  L. Crocker,et al.  Introduction to Classical and Modern Test Theory , 1986 .

[100]  Yoshino Watanabe Teacher Factors Mediating Washback , 2004 .

[101]  Pardee Lowe Proficiency: Panacea, Framework, Process? A Reply to Kramsch, Schulz, and, Particularly, to Bachman and Savignon. , 1986 .

[102]  Cyril J. Weir,et al.  Communicative Language Testing , 1991 .

[103]  G. Fulcher An English language placement test: issues in reliability and validity , 1997 .

[104]  Lindsay Brooks,et al.  Interacting in pairs in a test of oral proficiency: Co-constructing a better performance , 2009 .

[105]  L. Terman The measurement of intelligence , 2007 .

[106]  E. Thorndike A constant error in psychological ratings. , 1920 .

[107]  L. Hamp-Lyons Ethical Test Preparation Practice: The Case of the TOEFL , 1998 .

[108]  Jesús García Laborda From Fulcher to PLEVALEX: Issues in Interface Design, Validity and Reliability in Internet Based Language Testing. , 2007 .

[109]  Michael T. Kane,et al.  An argument-based approach to validity. , 1992 .

[110]  Carolyn E. Turner,et al.  Constructing rating scales for second language tests , 1995 .

[111]  Elana Shohamy,et al.  The Power of Tests: A Critical Perspective on the Uses of Language Tests , 2001 .

[112]  A. Cohen,et al.  A Generalized Examinee-Centered Method for Setting Standards on Achievement Tests , 1999 .

[113]  Gary Buck,et al.  The testing of listening comprehension: an introspective study1 , 1991 .

[114]  Anthony Green,et al.  Ielts Washback in Context: Preparation for Academic Writing in Higher Education , 2007 .

[115]  Kate Menken NO CHILD LEFT BEHIND AND ITS EFFECTS ON LANGUAGE POLICY , 2009, Annual Review of Applied Linguistics.

[116]  Glenn Fulcher,et al.  The 'communicative' legacy in language testing , 2000 .

[117]  David Little The Common European Framework of Reference for Languages: Content, purpose, origin, reception and impact , 2006, Language Teaching.

[118]  T. R. Husek,et al.  IMPLICATIONS OF CRITERION‐REFERENCED MEASUREMENT1,2 , 1969 .

[119]  Donald Ross Green,et al.  Interpreting the Results of Three Different Standard‐Setting Procedures , 2005 .

[120]  Annie Brown,et al.  Interviewer variation and the co-construction of speaking proficiency , 2003 .

[121]  J. Frederiksen,et al.  A Systems Approach to Educational Testing , 1989 .

[122]  Leanne R. Ketterlin-Geller,et al.  Testing Students with Special Needs: A Model for Understanding the Interaction between Assessment and Student Characteristics in a Universally Designed Environment. , 2008 .

[123]  L. Shepard The Role of Assessment in a Learning Culture , 2000 .

[124]  Catherine S. Taylor,et al.  What Does the Psychometrician's Classroom Look Like?: Reframing Assessment Concepts in the Context of Learning. , 1996 .

[125]  Glenn Fulcher,et al.  Deluded by Artifices? The Common European Framework and Harmonization , 2004 .

[126]  Jeremy Bentham,et al.  The Panopticon Writings , 1995 .

[127]  Glenn Fulcher,et al.  Test architecture, test retrofit , 2009 .

[128]  Anne Lazaraton,et al.  A qualitative approach to the validation of oral language tests , 2002 .

[129]  Anne Lazaraton,et al.  Interlocutor support in oral proficiency interviews: the case of CASE , 1996 .

[130]  L. Shepard Psychometricians’ Beliefs About Learning , 1991 .

[131]  J. Roach Public examinations in England, 1850-1900 , 1971 .

[132]  B. Chiswick,et al.  Language skills and earnings among legalized aliens , 1999, Journal of population economics.

[133]  Grant Henning,et al.  An Investigation of the Construct Validity of the ACTFL Proficiency Guidelines and Oral Interview Procedure , 1990 .

[134]  C. Spearman,et al.  GENERAL ABILITY, ITS EXISTENCE AND NATURE , 1912 .

[135]  Lawrence T. Frase,et al.  Technologies for Language Assessment , 1996, Annual Review of Applied Linguistics.

[136]  Standard Setting in Relation to the Common European Framework of Reference for Languages: The Case of the State Examination of Dutch as a Second Language , 2009 .

[137]  Gene V. Glass,et al.  Standards and Criteria* , 1978, Journal of MultiDisciplinary Evaluation.

[138]  Norman Frederiksen,et al.  THE REAL TEST BIAS , 1981 .

[139]  Gregory J. Cizek,et al.  Standard‐Setting Guidelines , 2005 .

[140]  Samuel A. Livingston,et al.  Passing Scores: A Manual for Setting Standards of Performance on Educational and Occupational Tests. , 1982 .

[141]  Michael H. Long,et al.  An introduction to second language acquisition research , 1990 .

[142]  G. Hogg,et al.  An empirical investigation of the impact of non‐verbal communication on service evaluation , 2000 .

[143]  Michael C. Rodriguez,et al.  A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment , 2002 .

[144]  M. Kane Validating the Performance Standards Associated With Passing Scores , 1994 .

[145]  Virpi Ylänne-McEwen,et al.  Shifting alignment and negotiating sociality in travel agency discourse , 2004 .

[146]  V. Zeithaml Service quality, profitability, and the economic worth of customers: What we know and what we need to learn , 2000 .

[147]  James P. Lantolf,et al.  Oral‐Proficiency Testing: A Critical Analysis , 1985 .

[148]  Lauren B. Resnick,et al.  Benchmarking and Alignment of Standards and Testing , 2004 .

[149]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[150]  Yoshinori Watanabe,et al.  Methodology in Washback Studies , 2004 .

[151]  Claire J. Kramsch From Language Proficiency to Interactional Competence , 1986 .

[152]  W. James Popham,et al.  Criterion-Referenced Measurement , 1971 .

[153]  J. Cattell Mental Tests and Measurements , 1890 .

[154]  N. Saville INTERVIEW: An Interview With John Trim at 80 , 2005 .

[155]  Glenn Fulcher,et al.  Does thick description lead to smart tests? A data-based approach to rating scale construction , 1996 .

[156]  C. Duncan Ethics and professionalism. , 1987, Journal of the South Carolina Medical Association.

[157]  James Dean Brown,et al.  Computers in language testing: Present research and some future directions , 1997 .

[158]  Glenn Fulcher,et al.  Invalidating Validity Claims for the ACTFL Oral Rating Scale. , 1996 .

[159]  Michael Russell,et al.  Testing Writing on Computers: An Experiment Comparing Student Performance on Tests Conducted via Computer and via Paper-and-Pencil , 1997 .

[160]  M. Rost Teaching and Researching Listening , 2001 .

[161]  Craig W. Deville,et al.  Nationally Mandated Testing for Accountability: English Language Learners in the US , 2008 .

[162]  Bernard Spolsky,et al.  PRELIMINARY STUDIES IN THE DEVELOPMENT OF TECHNIQUES FOR TESTING OVERALL SECOND LANGUAGE PROFICIENCY1 , 1968 .

[163]  M. Perie A Guide to Understanding and Developing Performance-Level Descriptors. , 2008 .

[164]  Virginia P. Richmond,et al.  Reliability and Separation of Factors on the Assertiveness-Responsiveness Measure , 1990 .

[165]  Paul E. Barton National Education Standards: Getting beneath the Surface. Policy Information Perspective. , 2009 .

[166]  S. Krashen Second Language Acquisition and Second Language Learning , 1988 .

[167]  Glenn Fulcher,et al.  lnterface design in computer-based language testing , 2011 .

[168]  S. Ross Divergent Frame Interpretations in Oral Proficiency Interview Interaction , 1998 .

[169]  Mats Oscarson,et al.  Self-assessment of language proficiency: rationale and applications , 1989 .

[170]  M. Swain,et al.  THEORETICAL BASES OF COMMUNICATIVE APPROACHES TO SECOND LANGUAGE TEACHING AND TESTING , 1980 .

[171]  William Grabe,et al.  Teaching and Researching Reading , 2019 .

[172]  Yasuyo Sawaki,et al.  Comparability of Conventional and Computerized Tests of Reading in a Second Language , 2001 .

[173]  M. Chodorow,et al.  BEYOND ESSAY LENGTH: EVALUATING E-RATER®'S PERFORMANCE ON TOEFL® ESSAYS , 2004 .

[174]  Henry Latham,et al.  On The Action Of Examinations: Considered As A Means Of Selection , 2008 .

[175]  N. Webb Criteria for Alignment of Expectations and Assessments in Mathematics and Science Education. Research Monograph No. 6. , 1997 .

[176]  G. Fulcher Assessment in English for academic purposes: putting content validity in its place , 1999 .

[177]  Stephen G. Sireci,et al.  Unlabeling the Disabled: A Perspective on Flagging Scores From Accommodated Test Administrations , 2005 .

[178]  P. Moss Reconceptualizing Validity for Classroom Assessment , 2005 .

[179]  Brian North,et al.  Scaling descriptors for language proficiency scales , 1998 .

[180]  Semire Dikli,et al.  An Overview of Automated Scoring of Essays. , 2006 .

[181]  Thomas M. Haladyna,et al.  Raising Standardized Achievement Test Scores and the Origins of Test Score Pollution , 1991 .

[182]  Micheline Chalhoub-Deville,et al.  Theoretical models, assessment frameworks and test construction , 1997 .

[183]  N D C Harris,et al.  Evaluating and assessing for learning , 1986 .

[184]  R. Glaser Criterion-Referenced Tests: Part I. origins , 2005 .

[185]  P. Kefalides Illiteracy: the silent barrier to health care. , 1999, Annals of internal medicine.

[186]  G. Bonnet The CEFR and Education Policies in Europe , 2007 .

[187]  Patricia A. Dunkel,et al.  Considerations in Developing or Using Second/Foreign Language Proficiency Computer-Adaptive Tests , 1999 .

[188]  K. Geisinger Using Standard-Setting Data to Establish Cutoff Scores. , 1991 .

[189]  Conrad M. Schirokauer,et al.  China's Examination Hell: The Civil Service Examinations of Imperial China , 1981 .

[190]  Brent Bridgeman,et al.  Testing and Time Limits , 2004 .

[191]  Geoff Brindley,et al.  Outcomes-based assessment and reporting in language learning programmes: a review of the issues , 1998 .

[192]  Alison Green,et al.  Verbal Protocol Analysis in Language Testing Research: A Handbook , 1998 .

[193]  Manfred Pienemann,et al.  Constructing an Acquisition-Based Procedure for Second Language Assessment , 1988, Studies in Second Language Acquisition.

[194]  Manfred Pienemann,et al.  An acquisition based procedure for second language assessment (ESL) , 1986 .

[195]  Ronald A. Berk,et al.  A Consumer’s Guide to Setting Performance Standards on Criterion-Referenced Tests , 1986 .

[196]  Wilhelm Freiherr von Humboldt,et al.  The limits of state action , 1969 .

[197]  Glenn Fulcher,et al.  The Oral Proficiency Interview: A Research Agenda. , 2003 .

[198]  Jaekyung Lee Is Test-Driven External Accountability Effective? Synthesizing the Evidence From Cross-State Causal-Comparative and Correlational Studies , 2008 .

[199]  Richard J. Stiggins,et al.  The Unfulfilled Promise of Classroom Assessment , 2005 .

[200]  Brian North,et al.  The development of a common framework scale of descriptors of language proficiency based on a theory of measurement , 1995 .

[201]  Lyle F. Bachman,et al.  语言测试实践 = Language testing in practice , 1998 .

[202]  Hye-kyung Ryoo Achieving friendly interactions: a study of service encounters between Korean shopkeepers and African-American customers , 2005 .

[203]  Michael Schommer,et al.  Informal reasoning assessment using verbal reports of thinking to improve multiple-choice test validity / , 1988 .

[204]  Brian North,et al.  The development of a common framework scale of language proficiency , 2000 .

[205]  J. Read,et al.  The impact of IELTS on preparation for academic study in New Zealand , 2003 .

[206]  G. Cizek,et al.  Setting performance standards : foundations, methods, and innovations , 2012 .

[207]  EFFECTS OF THE PRESENCE AND ABSENCE OF VISUALS ON PERFORMANCE ON TOEFL® CBT LISTENING-COMPREHENSION STIMULI , 2001 .

[208]  Pete Moore Testing times ahead , 2002, Genome Biology.

[209]  Fred Dervin Assessing intercultural competence in Language Learning and Teaching: a critical review of current efforts , 2009 .

[210]  Clare Lee,et al.  Assessment for Learning- putting it into practice , 2003 .

[211]  Dina Tsagari The Complexity of Test Washback: An Empirical Study , 2009 .

[212]  Stephanie W. Cawthon Accommodations for Students Who Are Deaf or Hard of Hearing in Large-Scale, Standardized Assessments: Surveying the Landscape and Charting a New Direction. , 2009 .

[213]  T. McNamara 21st Century Shibboleth: Language Tests, Identity and Intergroup Conflict , 2005 .

[214]  D. H. Peterson,et al.  Autobiography , 1985, Steroids.

[215]  Susy Macqueen,et al.  Validity , 1973, Just Algorithms.

[216]  Elizabeth Platt,et al.  Marginalizing English as a second language teacher expertise: The exclusionary consequence of No Child Left Behind , 2008 .

[217]  Leo Van Lier,et al.  Reeling, Writhing, Drawling, Stretching, and Fainting in Coils: Oral Proficiency Interviews as Conversation , 1989 .

[218]  L. Cronbach Essentials of psychological testing , 1960 .

[219]  Richard F. Burton,et al.  Quantifying the Effects of Chance in Multiple Choice and True/False Tests: Question selection and guessing of answers , 2001 .

[220]  Grazyna Pawlikowska-Smith Canadian Language Benchmarks 2000: Theoretical Framework. , 2002 .

[221]  Brian North,et al.  Common European Framework of Reference for Languages: learning, teaching, assessment , 2009 .

[222]  Merrill Swain,et al.  Researching Pedagogic Tasks: Second Language Learning, Teaching, and Testing , 2013 .

[223]  Micheline Chalhoub-Deville,et al.  Issues in Computer-Adaptive Testing of Reading Proficiency , 2000 .

[224]  James Dean Brown,et al.  Using surveys in language programs , 2001 .

[225]  Glenn Fulcher,et al.  The construction and validation of rating scales for oral tests in English as a foreign language , 1993 .

[226]  Mrudula Patri,et al.  The influence of peer feedback on self- and peer-assessment of oral skills , 2002 .

[227]  Joan Jamieson,et al.  COMPUTER FAMILIARITY AMONG TOEFL EXAMINEES , 1998 .

[228]  W. Popham Criterion-referenced instruction , 1973 .

[229]  D. Hymes 2 On Communicative Competence , 2011 .

[230]  Alan Davies,et al.  語言測試詞典=Dictionary of language testing , 1999 .

[231]  C. Burt MENTAL MEASUREMENT , 1955 .

[232]  L. Shepard Using Assessment to Improve Learning. , 1995 .

[233]  Mohammed M. Obeidat Attitudes and Motivation in Second Language Learning , 2005 .

[234]  Thomas Robb,et al.  A Study of the Effect of Direct Test Preparation on the TOEIC Scores of Japanese University Students. , 1999 .

[235]  Eugene O. Winter,et al.  A clause-relational approach to English texts: A study of some predictive lexical items in written discourse , 1977 .

[236]  R. Yerkes,et al.  Army Mental Tests , 1920 .

[237]  R. Young,et al.  Language Proficiency Interviews: A Discourse Approach , 2012 .

[238]  Ofra Inbar-Lourie,et al.  Constructing a language assessment knowledge base: A focus on language assessment courses , 2008 .

[239]  Lyle F. Bachman,et al.  The Evaluation of Communicative Language Proficiency: A Critique of the ACTFL Oral Interview , 1986 .

[240]  C. Mills,et al.  The Theory of Social and Economic Organization , 1948 .

[241]  Alister Cumming,et al.  LANGUAGE ASSESSMENT IN EDUCATION: TESTS, CURRICULA, AND TEACHING , 2009, Annual Review of Applied Linguistics.

[242]  R. Schrauf Questionnaires in Second Language Research: Construction, Administration, and Processing , 2006 .

[243]  Dianne Wall,et al.  The Impact of High-Stakes Examinations on Classroom Teaching: A case study using insights from testing and innovation theory , 2006 .

[244]  Michael Milanovic,et al.  Developing Rating Scales for CASE: Theoretical Concerns and Analyses. , 1992 .

[245]  Sharon Bishop Thinking About a Professional Ethics , 2004 .

[246]  M. Hoey On the surface of discourse , 1983 .

[247]  A. Carnevale,et al.  Understanding, Speaking, Reading, Writing, and Earnings in the Immigrant Labor Market , 2001 .

[248]  Robert N. Kantor,et al.  ANALYTIC SCORING OF TOEFL® CBT ESSAYS: SCORES FROM HUMANS AND E‐RATER® , 2008 .

[249]  D. Wall,et al.  THE IMPACT OF CHANGES IN THE TOEFL EXAMINATION ON TEACHING AND LEARNING IN CENTRAL AND EASTERN EUROPE: PHASE 2, COPING WITH CHANGE , 2008 .

[250]  B. Obama The audacity of hope : thoughts on reclaiming the American dream , 2006 .

[251]  Gregory J. Cizek,et al.  What Is Standard Setting , 2007 .

[252]  Ian Blood,et al.  Automated Essay Scoring: A Literature Review , 2011 .

[253]  S. Gass,et al.  Stimulated Recall Methodology in Second Language Research , 2000 .

[254]  Pauline Rea-Dickins,et al.  Currents and eddies in the discourse of assessment: a learning‐focused interpretation , 2006 .

[255]  W. Kline War Against the Weak: Eugenics and America's Campaign to Create a Master Race , 2004 .

[256]  Lyn May,et al.  Co-constructed interaction in a paired speaking test: The rater's perspective , 2009 .

[257]  van Ek,et al.  Systems Development in Adult Language Learning: The Threshold Level in a European Unit/Credit System for Modern Language Learning by Adults. , 1975 .

[258]  R. Schulz From Achievement to Proficiency Through Classroom Instruction: Some Caveats. , 1986 .

[259]  Grant Henning The ACTFL oral proficiency interview: Validity evidence , 1992 .

[260]  Glenn Fulcher,et al.  Tests of Oral Performance: The Need for Data-based Criteria. , 1987 .

[261]  Sumie Matsuno,et al.  Self-, peer-, and teacher-assessments in Japanese university EFL writing classrooms , 2009 .

[262]  H. Haja Mydin On Liberty , 2010, BMJ : British Medical Journal.

[263]  Robert M. Yerkes,et al.  Psychological examining in the United States army. , 1921 .

[264]  S. Baker,et al.  Motivation, language identity and the L2 self , 2011 .

[265]  Brian K. Lynch,et al.  Testcraft: A Teacher`s Guide to Writing and Using Language Test Specifications , 2001 .

[266]  L. J. Stricker The performance of native speakers of English and ESL speakers on the computer-based TOEFL and GRE General Test , 2002 .

[267]  W. James Popham,et al.  Appropriateness of Teachers' Test-Preparation Practices , 1991 .

[268]  S. Ross Self-assessment in second language testing: a meta-analysis and analysis of experiential factors , 1998 .

[269]  Jan H. Hulstijn,et al.  The shaky ground beneath the CEFR: Quantitative and qualitative dimensions of language proficiency , 2007 .

[270]  R. Almond,et al.  A BRIEF INTRODUCTION TO EVIDENCE-CENTERED DESIGN , 2003 .

[271]  Sandra J. Thompson,et al.  Universal design and multimethod approaches to item review , 2008 .

[272]  Fred Davidson Principles of statistical data handling , 1996 .

[273]  K. Popper,et al.  The Open Society and Its Enemies , 1946 .

[274]  D. Kevles Testing the Army's Intelligence: Psychologists and the Military in World War I , 1968 .

[275]  Dynamic Assessment and the Problem of Validity in the L2 Classroom , 2008 .

[276]  John A. Ross The Reliability, Validity, and Utility of Self-Assessment , 2006 .

[277]  Robert J. Mislevy,et al.  Accessibility of Tests for Individuals with Disabilities within a Validity Framework. , 2005 .

[278]  Andrew Dilnot,et al.  The Tiger That Isn't: Seeing Through a World of Numbers , 2007 .