Reporting and interpreting non-significant results in animal cognition research

How statistically non-significant results are reported and interpreted following null hypothesis significance testing is often criticized. This issue is important for animal cognition research because studies in the field are often underpowered to detect theoretically meaningful effect sizes, i.e., often produce non-significant p-values even when the null hypothesis is incorrect. Thus, we manually extracted and classified how researchers report and interpret non-significant p-values and examined the p-value distribution of these non-significant results across published articles in animal cognition and related fields. We found a large amount of heterogeneity in how researchers report statistically non-significant p-values in the result sections of articles, and how they interpret them in the titles and abstracts. Reporting of the non-significant results as “No Effect” was common in the titles (84%), abstracts (64%), and results sections (41%) of papers, whereas reporting of the results as “Non-Significant” was less common in the titles (0%) and abstracts (26%), but was present in the results (52%). Discussions of effect sizes were rare (<5% of articles). A p-value distribution analysis was consistent with research being performed with low power of statistical tests to detect effect sizes of interest. These findings suggest that researchers in animal cognition should pay close attention to the evidence used to support claims of absence of effects in the literature, and—in their own work—report statistically non-significant results clearly and formally correct, as well as use more formal methods of assessing evidence against theoretical predictions.

[1]  Daniel Lakens,et al.  Sample Size Justification , 2021, Collabra: Psychology.

[2]  P. Meza,et al.  The effect of substrate on prey capture does not match natural substrate use in a wolf spider , 2021, Animal Behaviour.

[3]  Nicolas A. Hazzi,et al.  The role of learning, acoustic similarity and phylogenetic relatedness in the recognition of distress calls in birds , 2021, Animal Behaviour.

[4]  P. Dijkstra,et al.  Social network stability is impacted by removing a dominant male in replicate dominance hierarchies of a cichlid fish , 2021, Animal Behaviour.

[5]  C. Kvarnemo,et al.  Home range use in the West Australian seahorse Hippocampus subelongatus is influenced by sex and partner’s home range but not by body size or paired status , 2021, Journal of Ethology.

[6]  G. Schino,et al.  Testing the two sides of indirect reciprocity in tufted capuchin monkeys , 2020, Behavioural Processes.

[7]  D. Kelly,et al.  Highly social pinyon jays, but not less social Clark’s nutcrackers, modify their food-storing behaviour when observed by a heterospecific , 2021 .

[8]  L. Boyle,et al.  The Equipment Used in the SF6 Technique to Estimate Methane Emissions Has No Major Effect on Dairy Cow Behavior , 2021, Frontiers in Veterinary Science.

[9]  M. Kreuzer,et al.  Little Difference in Milk Fatty Acid and Terpene Composition Among Three Contrasting Dairy Breeds When Grazing a Biodiverse Mountain Pasture , 2021, Frontiers in Veterinary Science.

[10]  A. Bevilacqua,et al.  Strategic use of straw as environmental enrichment for prepartum sows in farrowing crates , 2020 .

[11]  D. Booth,et al.  Odd one in: Oddity within mixed‐species shoals does not affect shoal preference by vagrant tropical damselfish in the presence or absence of a predator , 2020 .

[12]  F. Hou,et al.  Behavioral patterns of yaks (Bos grunniens) grazing on alpine shrub meadows of the Qinghai-Tibetan Plateau , 2020 .

[13]  C. Phillips,et al.  The effects of environmental enrichment on the behaviour of cockatiels (Nymphicus hollandicus) in aviaries , 2020 .

[14]  E. Ribes-Iñesta,et al.  Temporal contingencies are dependent on space location: Distal and proximal concurrent water schedules , 2020, Behavioural Processes.

[15]  S. Hirata,et al.  Does size matter? Examining the possible mechanisms of multi-stallion groups in horse societies , 2020, Behavioural Processes.

[16]  L. Huber,et al.  Partial rewarding during clicker training does not improve naïve dogs’ learning speed and induces a pessimistic-like affective state , 2020, Animal cognition.

[17]  T. Kuriwada,et al.  Effect of predator cue on escape and oviposition behaviour of freshwater snail , 2020 .

[18]  G. Deshpande,et al.  Comparing pet and detection dogs (Canis familiaris) on two aspects of social cognition , 2020, Learning & Behavior.

[19]  M. S. DeVries,et al.  Similarities in expression of territorial aggression in breeding pairs of northern cardinals, Cardinalis cardinalis , 2020, Journal of Ethology.

[20]  Katharina F. Brecht,et al.  Carrion crows (Corvus corone corone) fail the mirror mark test yet again. , 2020, Journal of comparative psychology.

[21]  T. Shahan,et al.  Delays to food-predictive stimuli do not affect suboptimal choice in rats. , 2020, Journal of Experimental Psychology: Animal Learning and Cognition.

[22]  P. Edelsbrunner,et al.  Improving the Utility of Non-Significant Results for Educational Research , 2020 .

[23]  Justin A. Harris,et al.  Pavlovian conditioning under partial reinforcement: The effects of nonreinforced trials versus cumulative conditioned stimulus duration. , 2020, Journal of experimental psychology. Animal learning and cognition.

[24]  N. Clayton,et al.  Replications in Comparative Cognition: What Should We Expect and How Can We Improve? , 2020, Animal behavior and cognition.

[25]  Matthew B Broschard,et al.  Pigeons exhibit flexibility but not rule formation in dimensional learning, stimulus generalization, and task switching. , 2020, Journal of experimental psychology. Animal learning and cognition.

[26]  Deirdre B. Yeater,et al.  Laterality of Eye Use by Bottlenose (Tursiops truncatus) and Rough-Toothed (Steno bredanensis) Dolphins While Viewing Predictable and Unpredictable Stimuli , 2020, International Journal of Comparative Psychology.

[27]  J. Manjarrez,et al.  Hypoxia by Altitude and Welfare of Captive Beaded Lizards (Heloderma Horridum) in Mexico: Hematological Approaches , 2020, Journal of applied animal welfare science : JAAWS.

[28]  L. Ostojić,et al.  The illusion of science in comparative cognition , 2019 .

[29]  Ron S. Kenett,et al.  Many perspectives on Deborah Mayo's "Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars" , 2019, 1905.08876.

[30]  Mike J. F. Robinson,et al.  Evidence for motivational enhancement of sign-tracking behavior under reward uncertainty. , 2019, Journal of experimental psychology. Animal learning and cognition.

[31]  H. Koda,et al.  Common marmosets (Callithrix jacchus) evaluate third-party social interactions of human actors but Japanese monkeys (Macaca fuscata) do not. , 2019, Journal of comparative psychology.

[32]  M. Beran,et al.  Limited evidence of number-space mapping in rhesus monkeys (Macaca mulatta) and capuchin monkeys (Sapajus apella). , 2019, Journal of comparative psychology.

[33]  T. Mathew,et al.  A new statistical method to test equivalence: an application in male and female eastern bluebird song , 2018, Animal Behaviour.

[34]  Deborah G. Mayo,et al.  Statistical Inference as Severe Testing , 2018 .

[35]  E. Wagenmakers,et al.  Quantifying Support for the Null Hypothesis in Psychology: An Empirical Investigation , 2018, Advances in Methods and Practices in Psychological Science.

[36]  Nicholas P. Holmes,et al.  Justify your alpha , 2018, Nature Human Behaviour.

[37]  D. Lakens Equivalence Tests , 2017, Social psychological and personality science.

[38]  Miguel A. Vadillo,et al.  Underpowered samples, false negatives, and unconscious learning , 2015, Psychonomic bulletin & review.

[39]  J. Carlin,et al.  Beyond Power Calculations , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[40]  A. Kühberger,et al.  A comprehensive review of reporting practices in psychological journals: Are effect sizes really enough? , 2013 .

[41]  Klaus Fiedler,et al.  The Long Way From α-Error Control to Validity Proper , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[42]  H. Beek F1000Prime recommendation of False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. , 2012 .

[43]  Charles Lambdin,et al.  Significance tests as sorcery: Science is empirical—significance tests are not , 2012 .

[44]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[45]  S. Goodman A dirty dozen: twelve p-value misconceptions. , 2008, Seminars in hematology.

[46]  Rink Hoekstra,et al.  Probability as certainty: Dichotomous thinking and the misuse ofp values , 2006, Psychonomic bulletin & review.

[47]  Neil Thomason,et al.  Impact of Criticism of Null‐Hypothesis Significance Testing on Statistical Reporting Practices in Conservation Biology , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[48]  G. Gigerenzer,et al.  The null ritual : What you always wanted to know about significance testing but were afraid to ask , 2004 .

[49]  Jacob Cohen The earth is round (p < .05) , 1994 .

[50]  J. Neyman Tests of statistical hypotheses and their use in studies of natural phenomena , 1976 .

[51]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .