Evaluating neurological outcome measures : the bare essentials

Chronic impairment is a common consequence of neurological disease. Many of these disorders affect the younger age group and are often progressive over many decades. Measurement of health related outcomes such as disability, handicap, and quality of life are therefore important in the evaluation of therapeutic efficacy. To ensure sound measurement of these outcomes, it is essential that the instruments used have been comprehensively evaluated not only in terms of clinical appropriateness but also, and perhaps more importantly, with respect to their scientific properties. Clinicians are often unfamiliar with the rigorous scientific techniques required to design and evaluate health measurement tools largely because the theoretical foundations and methodological concepts, which originated in the social sciences, have been slow to transfer to medicine. This editorial introduces these concepts and provides a basis of knowledge for informed decision making in the evaluation of outcome measures. Instruments for measurement of outcome must be evaluated in terms of both clinical usefulness and scientific soundness. If an instrument is to be clinically useful and acceptable such that it can be incorporated into daily practice, it must be appropriate to the patient group being studied, brief, user friendly, practical to administer, and cost effective. Unwieldy, time consuming, and resource consuming instruments have limited use in clinical practice. Clinical utility, however, does not guarantee scientific soundness in terms of rigorous measurement. The second and perhaps more important step in instrument evaluation is the assessment of three scientific properties which ensure reliable and valid measurement of the health outcome of interest: (1) Reliability considers the question of whether an instrument measures outcome in a way that is accurate, consistent, stable over time, and reproducible. (2) Validity considers whether an instrument measures what it purports or is intended to measure. (3) Responsiveness determines whether an instrument is sensitive to and can detect clinically important change. The basic principles and methods of these three scientific concepts were developed in the social sciences, particularly psychology, where the need for rigorous measurement of abstract entities such as intelligence and personality stimulated conceptual and methodological advances that led to the establishment of psychometrics-the science of measurement. The foundations of this science were laid in the mid-1800s and were followed by extensive developments in the 1930s to '50s. Clinical medicine has traditionally been concerned with simple and easy to measure outcomes such as mortality, presence or absence of disease, duration of survival, and duration of disease free interval. Developments in health care and changing social conditions have resulted in an increasing prevalence of chronic illnesses and led to a broader World Health Organisation definition of health as "a complete state of physical, mental, and social wellbeing and not merely the absence of disease or infirmity". These changes, coupled with recent developments including diagnostic advances,' emergence of new treatments,2 and the importance of incorporating the patient perspective,3-5 have highlighted the inadequacy of traditional outcomes and have pointed to the need for the assessment of more pertinent but abstract concepts such as disability, handicap, and quality of life. The knowledge required to ensure that these complex entities are being measured with the necessary scientific rigour has yet to transfer from the social sciences and is generally unavailable to most clinicians. This may explain why, even though many instruments exist, 8 measurement of disability and handicap is viewed as being in its infancy from a scientific point of view.7 9 '°

[1]  R. Gelber MEASURING DISEASE: A REVIEW OF DISEASE‐SPECIFIC QUALITY OF LIFE MEASUREMENTS SCALES. , 1996 .

[2]  O. Devinsky Outcome research in neurology: Incorporating health‐related quality of life , 1995, Annals of neurology.

[3]  Shah Ebrahim,et al.  A postal version of the Barthel Index , 1994 .

[4]  A. Hopkins Economic change and health service reform: likely impact on teaching, practice, and research in neurology. , 1994, Journal of neurology, neurosurgery, and psychiatry.

[5]  Johanne Martel Measures of Need and Outcome for Primary Health Care , 1994 .

[6]  R. S. Banner The era of the patient. , 1993, JAMA.

[7]  D. Wade,et al.  Measurement in neurological rehabilitation. , 1992, Current opinion in neurology and neurosurgery.

[8]  Dh Miller,et al.  Magnetic resonance imaging in clinical practice , 1992 .

[9]  R A Deyo,et al.  Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. , 1991, Controlled clinical trials.

[10]  G. Norman,et al.  Issues in the use of change scores in randomized trials. , 1989, Journal of clinical epidemiology.

[11]  G H Guyatt,et al.  Responsiveness and validity in health status measurement: a clarification. , 1989, Journal of clinical epidemiology.

[12]  A. Anastasi Psychological testing, 6th ed. , 1988 .

[13]  G. Guyatt,et al.  Measuring change over time: assessing the usefulness of evaluative instruments. , 1987, Journal of chronic diseases.

[14]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[15]  R A Deyo,et al.  Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. , 1986, Journal of chronic diseases.

[16]  B Kirshner,et al.  A methodological framework for assessing health indices. , 1985, Journal of chronic diseases.

[17]  Functional assessment measures in medical rehabilitation: current status. , 1984, Archives of physical medicine and rehabilitation.

[18]  Douglas G. Altman,et al.  Measurement in Medicine: The Analysis of Method Comparison Studies , 1983 .

[19]  D. Saccuzzo,et al.  Psychological Testing: Principles, Applications, and Issues , 1982 .

[20]  S. Messick Test validity and the ethics of assessment. , 1980 .

[21]  J. Hallpike New treatments for multiple sclerosis. , 1980, British journal of hospital medicine.

[22]  Edward G. Carmines,et al.  Reliability and Validity Assessment , 1979 .

[23]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[24]  R. Sitgreaves Psychometric theory (2nd ed.). , 1979 .

[25]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[26]  F. G. Brown,et al.  Principles of educational and psychological testing , 1970 .

[27]  J. Nunnally Introduction to Psychological Measurement , 1970 .

[28]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[29]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[30]  D. Campbell Recommendations for APA test standards regarding construct, trait, or discriminant validity. , 1960 .

[31]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[32]  N. Tallent Psychological testing. , 1960, The American journal of nursing.

[33]  D. Campbell,et al.  Convergent and discriminant validation by the multitrait-multimethod matrix. , 1959, Psychological bulletin.

[34]  L. Cronbach,et al.  Construct validity in psychological tests. , 1955, Psychological bulletin.

[35]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[36]  M. W. Richardson,et al.  The theory of the estimation of test reliability , 1937 .