Recommendations for reporting the results of studies of instrument and scale development and testing.

Scales and instruments play an important role in health research and practice. It is important that studies that report on their psychometric properties do so in a way such that readers can understand what was done and what was found. This paper is a guide to writing articles about the development and assessment of these tools. It covers what should be in the abstract and how key words should be chosen. The article then discusses what should be in the main parts of the paper: the introduction, methods, results and discussion. In each of these parts, it suggests the statistical tests that should be used and how to report them. The emphasis throughout the paper is that reliability and validity are not fixed properties of a scale, but depend on an interaction among it, the population being evaluated and the circumstances under which the instrument is administered.

[1]  J. Guilford Psychometric methods, 2nd ed. , 1954 .

[2]  G. Guyatt,et al.  Grading quality of evidence and strength of recommendations , 2004, BMJ : British Medical Journal.

[3]  E. Steyerberg,et al.  Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research , 2013, PLoS medicine.

[4]  G. Meyer,et al.  Guidelines for Reporting Information in Studies of Diagnostic Test Accuracy: The STARD Initiative , 2003, Journal of personality assessment.

[5]  D. Patrick,et al.  Measurement of Health Outcomes in Treatment Effectiveness Evaluations: Conceptual and Methodological Challenges , 2000, Medical care.

[6]  J. Lijmer,et al.  Various randomized designs can be used to evaluate medical tests. , 2009, Journal of clinical epidemiology.

[7]  M. D. de Villiers,et al.  The Delphi technique in health sciences education research , 2005, Medical teacher.

[8]  P. Whorwell,et al.  The irritable bowel severity scoring system: a simple method of monitoring irritable bowel syndrome and its progress , 1997, Alimentary pharmacology & therapeutics.

[9]  Jan Kottner,et al.  Validation and clinical impact of paediatric pressure ulcer risk assessment scales: A systematic review. , 2013, International journal of nursing studies.

[10]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[11]  D. V. Cocchetti Sample size requirements for increasing the precision of reliability estimates: problems and proposed solutions. , 1999, Journal of clinical and experimental neuropsychology.

[12]  Jichuan Wang,et al.  Structural Equation Modeling: Applications Using Mplus , 2012 .

[13]  M. Lynn Determination and quantification of content validity. , 1986, Nursing research.

[14]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[15]  D. Streiner,et al.  Health Measurement Scales: A practical guide to thier development and use , 1989 .

[16]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[17]  C. H. Lawshe A QUANTITATIVE APPROACH TO CONTENT VALIDITY , 1975 .

[18]  S. Walter,et al.  Sample size and optimal designs for reliability studies. , 1998, Statistics in medicine.

[19]  Geoffrey R. Norman,et al.  Biostatistics: The Bare Essentials , 1993 .

[20]  D. Eignor The standards for educational and psychological testing. , 2013 .

[21]  Jan Kottner,et al.  Do pressure ulcer risk assessment scales improve clinical practice? , 2010, Journal of multidisciplinary healthcare.

[22]  Douglas G. Altman,et al.  Statistics with confidence: Confidence intervals and statistical guidelines . , 1990 .

[23]  Kate E Decleene,et al.  Publication Manual of the American Psychological Association , 2011 .

[24]  Gordon H Guyatt,et al.  GRADE: grading quality of evidence and strength of recommendations for diagnostic tests and strategies , 2008, BMJ : British Medical Journal.

[25]  R. Hays,et al.  Responsiveness to change: an aspect of validity, not a separate dimension , 1992, Quality of Life Research.

[26]  David L Streiner,et al.  "Precision" and "accuracy": two terms that are neither. , 2006, Journal of clinical epidemiology.

[27]  R. Singal,et al.  Exploring outcomes of a nurse practitioner-managed cardiac surgery follow-up intervention: a randomized trial. , 2013, Journal of advanced nursing.

[28]  Chris Stenton,et al.  The MRC breathlessness scale. , 2008, Occupational medicine.

[29]  D. Streiner,et al.  Internal consistency and Cronbach's alpha: A comment on Beeckman et al. (2010). , 2010, International journal of nursing studies.

[30]  Manuel Barrera,et al.  Distinctions between social support concepts, measures, and models , 1986 .

[31]  M. Law,et al.  Health-related quality of life in childhood epilepsy: the results of children's participation in identifying the components. , 1999 .

[32]  M. Eliasziw,et al.  Sample size requirements for reliability studies. , 1987, Statistics in medicine.

[33]  黄亚明(整理),et al.  ICMJE , 2012 .

[34]  Edgar Erdfelder,et al.  G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences , 2007, Behavior research methods.

[35]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[36]  Paul Kline A Handbook of Test Construction , 1987 .

[37]  Johannes B. Reitsma,et al.  A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. , 2009, Journal of clinical epidemiology.

[38]  B. Uchino Understanding the Links Between Social Support and Physical Health: A Life-Span Perspective With Emphasis on the Separability of Perceived and Received Support , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[39]  M. McCabe,et al.  Examining supervised meals in patients with restrictive eating disorders. , 2013, Applied nursing research : ANR.

[40]  Robert Teasell,et al.  The Toronto Bedside Swallowing Screening Test (TOR-BSST): Development and Validation of a Dysphagia Screening Tool for Patients With Stroke , 2009, Stroke.

[41]  K. Balzer,et al.  What patient characteristics guide nurses' clinical judgement on pressure ulcer risk? A mixed methods study. , 2014, International journal of nursing studies.

[42]  Patrick J. McGrath,et al.  Construct validity of a multidimensional electronic pain diary for adolescents with arthritis , 2008, PAIN.

[43]  P. Dolan,et al.  Modeling valuations for EuroQol health states. , 1997, Medical care.

[44]  Robert D. Ankenmann,et al.  Determining sample size for a test of the equality of alpha coefficients when the number of part-tests is small , 1999 .

[45]  J. M. Cortina,et al.  What Is Coefficient Alpha? An Examination of Theory and Applications , 1993 .

[46]  R. Hambleton,et al.  Item Response Theory , 1984, The History of Educational Measurement.

[47]  Yvonne Vergouwe,et al.  Prognosis and prognostic research: validating a prognostic model , 2009, BMJ : British Medical Journal.

[48]  Jeremy D. Finn,et al.  Measurement and Evaluation. , 1973 .

[49]  V. Preedy,et al.  Responsiveness to Change , 2010 .

[50]  I. Mühlhauser,et al.  Comparison of a fall risk assessment tool with nurses' judgement alone: a cluster-randomised controlled trial. , 2009, Age and ageing.

[51]  J. L. Sanders,et al.  Directory of Unpublished Experimental Mental Measures , 1974 .

[52]  C. Shapiro,et al.  Structure of Lifestyle Disruptions in Chronic Disease: A Confirmatory Factor Analysis of the Illness Intrusiveness Ratings Scale , 2001, Medical care.

[53]  A. Hrõbjartsson,et al.  Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. , 2011, Journal of clinical epidemiology.

[54]  D. Cicchetti Methodological Commentary The Precision of Reliability and Validity Estimates Re-Visited: Distinguishing Between Clinical and Statistical Significance of Sample Size Requirements , 2001 .

[55]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[56]  Instruments for measuring fall risk in older adults living in long-term care facilities: an integrative review. , 2009, Journal of gerontological nursing.

[57]  V. Apgar A Proposal for a New Method of Evaluation of the Newborn Infant , 2015, Anesthesia and analgesia.

[58]  J Alonso,et al.  BMC Medical Research Methodology BioMed Central Study protocol Protocol of the COSMIN study: COnsensus-based Standards for the selection of health Measurement INstruments , 2006 .

[59]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[60]  D. Moher,et al.  The Revised CONSORT Statement for Reporting Randomized Trials: Explanation and Elaboration , 2001, Annals of Internal Medicine.

[61]  N. E. Gronlund Measurement and evaluation in teaching , 1965 .

[62]  David Moher,et al.  Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Standards for Reporting of Diagnostic Accuracy. , 2003, Clinical chemistry.

[63]  Shireen L. Rizvi,et al.  Development and validation of the Eating Disorder Diagnostic Scale: a brief self-report measure of anorexia, bulimia, and binge-eating disorder. , 2000, Psychological assessment.