Karl Popper used the black swan as a simple example of how to falsify the hypothesis that all swans are white. But how do psychologists falsify hypotheses about more complex human abilities and behaviour? Jim Kennedy has just posted online our new paper on how to plan falsifiable confirmatory research. The ability to obtain evidence that a hypothesis is true or false is a basic goal of science. However, most research has been designed to obtain evidence that a hypothesis is true without the design features needed to obtain evidence that the hypothesis is false. Many researchers appear to be unfamiliar with the methods that can provide evidence that a hypothesis is false. Research that can provide evidence that a hypothesis is true but cannot provide evidence that the hypothesis is false is biased science.
Even with the recent extensive discussions of methodological biases in psychological research, most of the proposed solutions would allow some forms of deficient methodological practices to continue. Our observation is that the ongoing debates about statistical methods for replication studies and about the value of retrospective meta-analysis show that methodological opinions are currently heading in many different directions and not converging to a consensus. The debates between advocates for the new statistics and advocates for hypothesis tests are a clear example.
We believe that falsifiable research provides a conceptual framework for resolving these debates and implementing optimal research methods. Although we have not found existing articles that provide useful discussions of the rationale and practical methods for implementing falsifiable research, we were unable to get our ideas published in either of two psychology journals. One editor said the ideas are not sufficiently novel to be published. We believe that the present paper provides valuable guidelines that are needed in psychological research—whether or not the various ideas in the paper are considered novel. The Abstract follows, and if you would like to read the full paper, the link is at the end.
Psychologists generally recognize falsifiable research as a basic goal of science. However, the methods for conducting falsifiable research with classical statistics and related methods for planning optimal Bayesian analyses have not yet been recognized and implemented by psychological researchers. The first step for falsifiable research is selection of a minimum effect size of interest such that a smaller effect would be too small to be of interest or would be evidence the hypothesis is false. If a minimum effect of interest is not specified explicitly, the effect size that just meets the criterion for acceptable evidence will implicitly function as a minimum effect of interest (e.g., the effect size that gives p = .05 or Bayes factor = 3). For confirmatory research, researchers should know what effect size is functioning as the minimum effect of interest. The second step is to determine the sample size that has power of at least .95 for the minimum effect of interest. Failure to obtain a significant result with power of .95 is evidence that the predicted effect specified in the power analysis is false for the conditions of the study. Such a failure can be considered as rejecting the alternative hypothesis at the .05 level using logic analogous to rejecting the null hypothesis. Evaluating the operating characteristics or power curve for a planned analysis reveals the effect sizes that can be reliably detected in a study and is needed for Bayesian methods as well as for classical methods. Specifying the effect sizes that can be reliably detected is as important as specifying the subject population. The third step is to publicly preregister the study with specific numerical inference criteria for evidence that the effect does not occur in addition to the usual criteria for evidence that the effect does occur. Studies with lower power may be conducted but the effect size with power of .95 is the falsifiable effect size for a study. Recent large studies have had adequate sample sizes for these methods. The relationships between these methods and meta-analyses, the “new statistics,” and common practices for power analysis are discussed. Falsifiable research provides a conceptual framework for resolving many debates about methodology for confirmatory research.
A PDF of the full paper is here.