Every Lab Needs a TOSTER

What feels like a very long time ago, I was trying to empirically investigate imaginative resistance with Nina Strohminger and Chandra Sripada. Our findings have been partly confirmed but also partly challenged by the recent replication effort by Mark Phelan and his student collaborators Wenjia Hu and Navin Rambharose, as part of the X-Phi Replicability Project.

In particular, these replicators challenged an inference that we made from a non-significant comparison result to there being no difference between two conditions. Specifically, we were trying to infer that genre does not make a difference to participant responses to questions about characters’ beliefs. They have shown that this inference is a mistake. As they put it, “Contra the original study, in this replication an effect of Genre on Belief was found. […] This was a very small, but significant, difference (t(97)=2.299, p=0.024, effect size r=0.052).” Thanks to Florian Cova, who noticed that the effect size should be r=0.227, which is more worrying for us.

Our inference pattern is a common but mistaken one. Daniel Lakens has argued for the need of using equivalence-testing to argue for such “no difference” conclusions. Indeed, using Lakens’s statistical package for the open-source statistical software jamovi, we can see that the inference was already not warranted by the original data.

(In this analysis, I set the upper and lower bounds to 0.2 Cohen’s dz, which corresponds to the conventional interpretation of a “small” effect. As the graph shows, our original data does not support the inference that there is no difference between the two conclusions because the top end of the 90% confidence interval exceeds the upper bound for equivalence.)

I am extremely appreciative of Phelan and colleagues for their efforts in replicating our original study. And, from this experience, I have learned that every lab needs a TOSTER!