Interrogating the Generalizability of Portfolio Assessments of Beginning Teachers: A Qualitative Study

This qualitative study is intended to illuminate factors that affect the generalizability of portfolio assessments of beginning teachers. By generalizability, we refer here to the extent to which the portfolio assessment supports generalizations from the particular evidence reflected in the portfolio to the conception of competent teaching reflected in the standards on which the assessment is based. Or, more practically, “The key question is, ‘How likely is it that this finding would be reversed or substantially altered if a second, independent assessment of the same kind were made?’” (Cronbach, Linn, Brennan, and Haertel, 1997, p. 1). In addressing this question, we draw on two kinds of evidence that are rarely available: comparisons of two different portfolios completed by the same teacher in the same year and comparisons between a portfolio and a multi-day case study (observation and interview completed shortly after portfolio submission) intended to parallel the evidence called for in the portfolio assessment. Our formative goal is to illuminate issues that assessment developers and users can take into account in designing assessment systems and appropriately limiting score interpretations.

[1]  P. Moss Shifting Conceptions of Validity in Educational Measurement: Implications for Performance Assessment , 1992 .

[2]  Stephen B. Dunbar,et al.  Quality Control in the Development and Use of Performance Assessments , 1991 .

[3]  Edward H. Haertel,et al.  Validating Standards-Based Test Score Interpretations , 2004 .

[4]  Pamela A. Moss,et al.  Can There Be Validity Without Reliability? , 1994 .

[5]  Samuel Messick Validity and washback in language testing , 1996 .

[6]  G. Engelhard,et al.  Examining the Psychometric Quality of the National Board for Professional Teaching Standards Early Childhood/Generalist Assessment System* , 2001 .

[7]  Donald B. Rubin,et al.  The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. , 1974 .

[8]  Susy Macqueen,et al.  Validity , 1973, Just Algorithms.

[9]  Daniel Koretz,et al.  Can Portfolios Assess Student Performance and Influence Instruction? The 1991-92 Vermont Experience. , 1992 .

[10]  R. Linn,et al.  Qualitative methods in research on teaching , 1985 .

[11]  Susan R. Goldman,et al.  Alternative Technologies for Large Scale Science Assessment: Instrument of Education Reform 1 , 1991 .

[12]  Pamela A. Moss,et al.  An Integrative Approach to Portfolio Evaluation for Teacher Licensure , 1998 .

[13]  Robert L. Brennan,et al.  Generalizability of Performance Assessments , 2005 .

[14]  Pamela A. Moss,et al.  Enlarging the Dialogue in Educational Measurement: Voices From Interpretive Research Traditions , 1996 .

[15]  Edward H. Haertel Construct Validity and Criterion-Referenced Testing , 1985 .

[16]  H. Rickman,et al.  Contemporary Hermeneutics: Hermeneutics as Method, Philosophy and Critique , 1982 .

[17]  Robert L. Brennan,et al.  An Essay on the History and Future of Reliability from the Perspective of Replications , 2001 .

[18]  R. Almond,et al.  Focus Article: On the Structure of Educational Assessments , 2003 .

[19]  P. Moss,et al.  Educational Standards, Assessment, and the Search for Consensus , 2001 .

[20]  Mark D. Reckase Portfolio Assessment: A Theoretical Esthnate of Score Reliability , 2005 .

[21]  Addressing Reliability Problems in the Portfolio Assessment of College Writing , 1993 .

[22]  Xiaohong Gao,et al.  Generalizability of Large-Scale Performance Assessments in Science: Promises and Problems , 1994 .

[23]  Robert L. Linn,et al.  Performance-Based Assessment: Implications of Task Specificity , 2005 .

[24]  Dc Washington,et al.  Interstate New Teacher Assessment and Support Consortium. , 1992 .

[25]  P. Moss,et al.  Risking Frankness in Educational Assessment , 1999 .

[26]  Eva Nick,et al.  The dependability of behavioral measurements: theory of generalizability for scores and profiles , 1973 .

[27]  Mei Liu,et al.  Generalizability and Validity of a Mathematics Performance Assessment , 1996 .

[28]  Milbrey W. McLaughlin,et al.  Teachers' work : individuals, colleagues, and contexts , 1993 .

[29]  Daniel Koretz,et al.  The Reliability of Scores from the 1992 Vermont Portfolio Assessment Program. Interim Report. , 1992 .

[30]  Maridyth M. McBee,et al.  The Generalizability of a Performance Assessment Measuring Achievement in Eight-Grade Mathematics , 1998 .

[31]  R. Mislevy Evidence and inference in educational assessment , 1994 .

[32]  Daniel F. McCaffrey,et al.  Interim Report: The Reliability of Vermont Portfolio Scores in the 1992-93 School Year , 1994 .

[33]  S. Messick The Interplay of Evidence and Consequences in the Validation of Performance Assessments , 1994 .

[34]  Allan S. Cohen,et al.  Validating Measures of Performance , 2005 .

[35]  N. Lyons,et al.  With Portfolio in Hand: Validating the New Teacher Professionalism , 1998 .

[36]  R. Brennan Elements of generalizability theory , 1983 .

[37]  R. Glaser,et al.  Knowing What Students Know: The Science and Design of Educational Assessment , 2001 .

[38]  Edward H. Haertel,et al.  Generalizability Analysis for Performance Assessments of Student Achievement or School Effectiveness , 1997 .

[39]  D. Schum,et al.  A Probabilistic Analysis of the Sacco and Vanzetti Evidence , 1996 .

[40]  C. Dwyer Psychometrics of Praxis III: Classroom Performance Assessments , 1998 .

[41]  Geoffrey R. Norman,et al.  Performance-Based Assessment: Lessons From the Health Professions , 1995 .

[42]  Donald A. Rock,et al.  Assessing Writing Skill , 1988 .

[43]  Susan S. Stodolsky,et al.  The Impact of Subject Matter on Curricular Activity: An Analysis of Five Academic Subjects , 1995 .

[44]  R. Jaeger Evaluating the Psychometric Qualities of the National Board for Professional Teaching Standards' Assessments: A Methodological Accounting , 1998 .

[45]  George Engelhard,et al.  Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model , 1994 .

[46]  P. Moss The Role of Consequences in validity Theory , 2005 .

[47]  Mark Wilson,et al.  From Principles to Practice: An Embedded Assessment System , 2000 .

[48]  S. Noakes The Hermeneutic Tradition: From Ast to Ricoeur , 1990 .

[49]  Linda M. McNeil,et al.  Contradictions of Control: School Structure and School Knowledge , 1988 .

[50]  Lee J. Cronbach,et al.  Construct validation after thirty years. , 1989 .

[51]  Daniel F. McCaffrey,et al.  The Realiability of Mathematics Portfolio Scores: Lessons From the Vermont Experience , 1995 .

[52]  P. Hamilton THE HERMENEUTIC TRADITION , 2003 .

[53]  Richard J. Shavelson,et al.  Generalizability Theory: A Primer , 1991 .

[54]  Robert J. Mislevy,et al.  Monitoring and Improving a Portfolio Assessment System. , 1995 .

[55]  Mark R. Wilson,et al.  An Examination of Variation in Rater Severity Over Time : A Study in Rater Drift , 2000 .

[56]  Clifton Bob Clark,et al.  The University of North Carolina at Greensboro , 1980 .

[57]  Michael S. Knapp,et al.  Social Class and Schooling. , 1995 .

[58]  E. Mandinach,et al.  FORMATIVE STUDIES OF PRAXIS III: CLASSROOM PERFORMANCE ASSESSMENTS AN OVERVIEW , 1993 .

[59]  Ginette Delandshere,et al.  Capturing Teachers' Knowledge: Performance Assessment , 1994 .

[60]  J. Talbert,et al.  The Contexts of teaching in secondary schools : teachers' realities , 1990 .

[61]  Robert J. Mislevy,et al.  Psychometric Principles in Student Assessment , 2003 .