The Gold Standard Paradox in Digital Image Analysis: Manual Versus Automated Scoring as Ground Truth.

CONTEXT - Novel therapeutics often target complex cellular mechanisms. Increasingly, quantitative methods like digital tissue image analysis (tIA) are required to evaluate correspondingly complex biomarkers to elucidate subtle phenotypes that can inform treatment decisions with these targeted therapies. These tIA systems need a gold standard, or reference method, to establish analytical validity. Conventional, subjective histopathologic scores assigned by an experienced pathologist are the gold standard in anatomic pathology and are an attractive reference method. The pathologist's score can establish the ground truth to assess a tIA solution's analytical performance. The paradox of this validation strategy, however, is that tIA is often used to assist pathologists to score complex biomarkers because it is more objective and reproducible than manual evaluation alone by overcoming known biases in a human's visual evaluation of tissue, and because it can generate endpoints that cannot be generated by a human observer. OBJECTIVE - To discuss common visual and cognitive traps known in traditional pathology-based scoring paradigms that may impact characterization of tIA-assisted scoring accuracy, sensitivity, and specificity. DATA SOURCES - This manuscript reviews the current literature from the past decades available for traditional subjective pathology scoring paradigms and known cognitive and visual traps relevant to these scoring paradigms. CONCLUSIONS - Awareness of the gold standard paradox is necessary when using traditional pathologist scores to analytically validate a tIA tool because image analysis is used specifically to overcome known sources of bias in visual assessment of tissue sections.

[1]  Steven K Shevell,et al.  Chromatic induction: border contrast or adaptation to surrounding light? , 1998, Vision Research.

[2]  T. Loetscher,et al.  Not all numbers are equal: preferences and biases among children and adults when generating random sequences , 2014, Front. Psychol..

[3]  D. Allred,et al.  Prognostic and predictive factors in breast cancer by immunohistochemical analysis. , 1998, Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc.

[4]  Dawn G Goodman,et al.  Best Practices Guideline: Toxicologic Histopathology , 2004, Toxicologic pathology.

[5]  Narayanan Srinivasan,et al.  The interplay of attention and consciousness in visual search, attentional blink and working memory consolidation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  M. K. Albert Occlusion, transparency, and lightness , 2007, Vision Research.

[7]  C. Pannucci,et al.  Identifying and Avoiding Bias in Research , 2010, Plastic and reconstructive surgery.

[8]  J. Ward,et al.  Proliferative and Nonproliferative Lesions of the Rat and Mouse Hepatobiliary System , 2010, Toxicologic pathology.

[9]  Peter B. Delahunt,et al.  Senescence of spatial chromatic contrast sensitivity. I. Detection under conditions controlling for optical factors. , 2005, Journal of the Optical Society of America. A, Optics, image science, and vision.

[10]  C. Scudamore,et al.  Review of approaches to the recording of background lesions in toxicologic pathology studies in rats. , 2014, Toxicology letters.

[11]  S. Raab Improving patient safety by examining pathology errors. , 2004, Clinics in Laboratory Medicine.

[12]  M F Dixon,et al.  Observer variation in the histological grading of rectal carcinoma. , 1983, Journal of clinical pathology.

[13]  L. Spillmann The Hermann Grid Illusion: A Tool for Studying Human Perceptive Field Organization , 1994 .

[14]  Tad T. Brunyé,et al.  The Influence of Disease Severity of Preceding Clinical Cases on Pathologists’ Medical Decision Making , 2017, Medical decision making : an international journal of the Society for Medical Decision Making.

[15]  Dingliang Zhu,et al.  Longitudinal change in end-digit preference in blood pressure recordings of patients with hypertension in primary care clinics: Minhang study , 2014, Blood pressure monitoring.

[16]  H. Lehr,et al.  Do we truly see what we think we see? The role of cognitive bias in pathological interpretation , 2008, Journal of Pathology.

[17]  E. Kay,et al.  C-erbB-2 immunostaining: problems with interpretation. , 1994, Journal of clinical pathology.

[18]  Edmund A Franken,et al.  Satisfaction of search from detection of pulmonary nodules in computed tomography of the chest. , 2013, Academic radiology.

[19]  Kenichiro Masaoka,et al.  Number of discernible object colors is a conundrum. , 2013, Journal of the Optical Society of America. A, Optics, image science, and vision.

[20]  R. Pickering Digit preference in estimated gestational age. , 1992, Statistics in medicine.

[21]  P. Bossuyt,et al.  Sources of Variation and Bias in Studies of Diagnostic Accuracy , 2004, Annals of Internal Medicine.

[22]  Daniel Memmert,et al.  The effects of eye movements, age, and expertise on inattentional blindness , 2006, Consciousness and Cognition.

[23]  Frederick A. A. Kingdom,et al.  Mach bands explained by response normalization , 2014, Front. Hum. Neurosci..

[24]  R. Sirota Defining error in anatomic pathology. , 2006, Archives of pathology & laboratory medicine.

[25]  S S Cross,et al.  Grading and scoring in histopathology , 1998, Histopathology.

[26]  David L Rimm,et al.  A prospective, multi-institutional diagnostic trial to determine pathologist accuracy in estimation of percentage of malignant cells. , 2013, Archives of pathology & laboratory medicine.

[27]  J A Morris,et al.  Information and observer disagreement in histopathology , 1994, Histopathology.

[28]  Ian O Ellis,et al.  Estrogen receptor-negative breast carcinomas: a review of morphology and immunophenotypical analysis , 2005, Modern Pathology.

[29]  A R Feinstein,et al.  Context bias. A problem in diagnostic radiology. , 1996, JAMA.

[30]  Jennifer S Myers,et al.  Seen Through Their Eyes: Residents’ Reflections on the Cognitive and Contextual Components of Diagnostic Errors in Medicine , 2012, Academic medicine : journal of the Association of American Medical Colleges.

[31]  Marc A Suchard,et al.  SEX, LIES AND SELF-REPORTED COUNTS: BAYESIAN MIXTURE MODELS FOR HEAPING IN LONGITUDINAL COUNT DATA VIA BIRTH-DEATH PROCESSES. , 2014, The annals of applied statistics.

[32]  D Purves,et al.  An Empirical Explanation of the Cornsweet Effect , 1999, The Journal of Neuroscience.

[33]  W. White,et al.  Terminal digit bias in a specialty hypertension faculty practice , 2003, Journal of Human Hypertension.

[34]  K. Pandher,et al.  Recommendations for the Evaluation of Pathology Data in Nonclinical Safety Biomarker Qualification Studies , 2011, Toxicologic pathology.

[35]  Elaine Kay,et al.  Virtual microscopy as an enabler of automated/quantitative assessment of protein expression in TMAs , 2008, Histochemistry and Cell Biology.

[36]  S Coren,et al.  Size contrast as a function of conceptual similarity between test and inducers , 1993, Perception & psychophysics.

[37]  Jonathan I. Flombaum,et al.  Why some colors appear more memorable than others: A model combining categories and particulars in color working memory. , 2015, Journal of experimental psychology. General.

[38]  Alexis B. Craig,et al.  Investigation of Biases and Compensatory Strategies Using a Probabilistic Variant of the Wisconsin Card Sorting Test , 2016, Front. Psychol..

[39]  S. Shevell,et al.  Chromatic assimilation: spread light or neural mechanism? , 2005, Vision Research.

[40]  E. Potchen,et al.  Measuring observer performance in chest radiology: some experiences. , 2006, Journal of the American College of Radiology : JACR.

[41]  S. Jackson,et al.  Vision: Getting to grips with the Ebbinghaus illusion , 2001, Current Biology.

[42]  W. J. Tuddenham Visual search, image organization, and reader error in roentgen diagnosis. Studies of the psycho-physiology of roentgen image perception. , 1962, Radiology.

[43]  C. Foster,et al.  Improvements in the data quality of a national BMI measuring programme , 2015, International Journal of Obesity.

[44]  Gregory T Sica,et al.  Bias in research studies. , 2006, Radiology.

[45]  J. Wolfe,et al.  The Invisible Gorilla Strikes Again , 2013, Psychological science.

[46]  P. Msaouel,et al.  Assessment of cognitive biases and biostatistics knowledge of medical residents: a multicenter, cross-sectional questionnaire study , 2014, Medical education online.

[47]  Edmund A Franken,et al.  Satisfaction of Search in Chest Radiography 2015. , 2015, Academic radiology.

[48]  Aapo Hyvärinen,et al.  Visual Features Underlying Perceived Brightness as Revealed by Classification Images , 2009, PloS one.

[49]  John S Werner,et al.  Spatial profile of contours inducing long-range color assimilation , 2006, Visual Neuroscience.

[50]  M. G. Fleming Pigmented lesion pathology: what you should expect from your pathologist, and what your pathologist should expect from you. , 2010, Clinics in plastic surgery.

[51]  A. Bechara,et al.  The Gambler’s Fallacy Is Associated with Weak Affective Decision Making but Strong Cognitive Ability , 2012, PloS one.

[52]  A. R. Rodrigues,et al.  Colour Vision Impairment in Young Alcohol Consumers , 2015, PloS one.

[53]  M. Stiegler,et al.  Cognitive errors detected in anaesthesiology: a literature review and pilot study. , 2012, British journal of anaesthesia.

[54]  Samuel M. McClure,et al.  Visual illusions and plate design: The effects of plate rim widths and rim coloring on perceived food portion size , 2013, International Journal of Obesity.

[55]  Yong-chun Cai,et al.  Small number preference in guiding attention , 2014, Experimental Brain Research.

[56]  Effects of Accuracy Motivation and Anchoring on Metacomprehension Judgment and Accuracy , 2012, The Journal of general psychology.

[57]  A. Yagi,et al.  A temporal window for estimating surface brightness in the Craik-O'Brien-Cornsweet effect , 2014, Front. Hum. Neurosci..

[58]  T. Colatsky,et al.  Impact of Pathologists and Evaluation Methods on Performance Assessment of the Kidney Injury Biomarker, Kim-1 , 2015, Toxicologic pathology.

[59]  H. Ditrich Cognitive fallacies and criminal investigations. , 2015, Science & justice : journal of the Forensic Science Society.

[60]  F. Boutitie,et al.  ESCAPE ancillary blood pressure measurement study 2: changes in end-digit preference after 2 years of a cluster randomized trial , 2015, Blood pressure monitoring.

[61]  G. E. Mirza,et al.  Effects of chronic smoking on color vision in young subjects. , 2015, International journal of ophthalmology.

[62]  C. Juan,et al.  Lateral prefrontal cortex contributes to maladaptive decisions , 2012, Proceedings of the National Academy of Sciences.

[63]  A. Warth,et al.  Optimized algorithm for Sanger sequencing-based EGFR mutation analyses in NSCLC biopsies , 2012, Virchows Archiv.

[64]  Rémy Versace,et al.  Memory plays tricks on me: perceptual bias induced by memory reactivated size in Ebbinghaus illusion. , 2015, Acta psychologica.

[65]  Mild abnormalities in liver histology associated with chronic hepatitis: distinction from normal liver histology. , 1997, Journal of clinical pathology.

[66]  Esther Perales,et al.  Number of discernible colors for color-deficient observers estimated from the MacAdam limits. , 2010, Journal of the Optical Society of America. A, Optics, image science, and vision.

[67]  D. Meyerholz,et al.  Principles for Valid Histopathologic Scoring in Research , 2013, Veterinary pathology.

[68]  L. Spillmann,et al.  The watercolor effect: Spacing constraints , 2009, Vision Research.

[69]  Arvydas Laurinavicius,et al.  Quantification of myocardial fibrosis by digital image analysis and interactive stereology , 2014, Diagnostic Pathology.

[70]  Brad Bolon,et al.  Commentary: Roles for Pathologists in a High-throughput Image Analysis Team. , 2016, Toxicologic pathology.

[71]  J A Hanley,et al.  Terminal digit preference, random error, and bias in routine clinical measurement of blood pressure. , 1993, Journal of clinical epidemiology.

[72]  C. Okerberg,et al.  Qualitative and Quantitative Analysis of Nonneoplastic Lesions in Toxicology Studies , 2002, Toxicologic pathology.

[73]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[74]  Ehsan Samei,et al.  Generalized "satisfaction of search": adverse influences on dual-target search accuracy. , 2010, Journal of experimental psychology. Applied.