Replacing an entrenched method in scientific research is difficult, even when the method is problematic. Such is the case with shifting from research studies based on the null hypothesis to a more realistic method of estimation.
Replacing an entrenched method in scientific research is difficult, even when the method is problematic. Such is the case with shifting from research studies based on the null hypothesis to a more realistic method of estimation.
Null hypothesis significance testing is a statistical method that assumes that no effect exists, such as no association between two variables or no difference between two experimental groups. This is the classical null hypothesis.
Based on a critical threshold and the so-called p-value (probability) calculated from the experimental data, the effect observed in the experiment is evaluated and the researcher may decide that it's likely or unlikely to be a real effect beyond experimental error and chance.
Unfortunately, inferring the presence or absence of effects based on significance testing gives researchers a false sense of certainty and causes many problems when interpreting experimental results.
The case for shifting from the significance testing approach of null hypothesis to an estimation approach is advocated by University of Basel evolutionary biologists Daniel Berner and Valentin Amrhein. They argue that there's a scientific paradigm shift under way from the null hypothesis focus on statistical significance, but that "this development has been largely ignored by evolutionary biologists."
Their article on the subject appears in the Journal of Evolutionary Biology, May 18.
The authors note in their introduction that more than 800 scientists signed on to a commentary, titled online "Scientists rise up against statistical significance," published in the journal Nature, March 20, 2019. (Co-author Amrhein was the lead writer of that article.) However, as they document in this article, 48 papers randomly selected from the 2020 Journal of Evolutionary Biology presented data in terms of the traditional and questionable method of statistical significance.
What's wrong with the null hypothesis approach
Daniel Berner explained the difference between estimation and significance by noting, "Both approaches deal with observed `effects,' with differences or relationships found in our studies and experiments. Estimation statistics emphasize the size of an observed effect itself, and how precisely it was estimated. Significance testing is usually used for obtaining a yes-or-no answer whether an observed effect is `present' or `real.' "
Berner and Amrhein present four main problems with this "yes-or-no" approach in biology.
"First, even when repeating an experiment under similar conditions, resulting p-values vary strongly from study to study, hence dichotomous inference using significance thresholds is usually unjustified," they write. This is because one study may find a significant effect while another study finds a non-significant result, and these different outcomes may erroneously be seen as conflicting.
"Second, ‘statistically significant’ results have overestimated effect sizes, a bias declining with increasing statistical power." This bias occurs because when samples by chance capture exaggerated effects, the test is more likely to be seen as significant.
"Third, ‘statistically non-significant’ results have underestimated effect sizes, and this bias gets stronger with higher statistical power.
"Fourth, the tested statistical hypotheses usually lack biological justification and are often uninformative."
Berner summarized why the significance testing approach is not helpful for science.
"It's because there are too many ways in which a relevant effect can produce a statistically non-significant test result, and an unimportant effect can turn out statistically significant," he said. "In many cases it is impossible to judge from one single study whether an observed effect is `real.'"
"Further," he added, "filtering test results based on a significance threshold usually leads us to overestimate the size of effects, because overestimated effect sizes are more likely to become statistically significant than more realistic effect size estimates. Significance testing leads to overconfidence in scientific inference."
An engrained approach
The use of significance testing has been around in science for decades. Berner said, "the problems with statistical inference based on significance testing were recognized surprisingly early, in the 1950s-1960s, or even before. But only within the last five to 10 years have critical voices become so numerous and loud that they are almost impossible to be ignored."
Other scientific areas began the paradigm shift to estimation in the last few years, so why is evolutionary biology so far behind?
Berner offered this explaination.
"My impression is that research fields like epidemiology or psychology have been particularly open to shifting from testing to estimation," he said. "Perhaps these are fields in which researchers are particularly well trained in statistics. In evolutionary biology--as in many other fields in life science for that matter--statistical practice is often passed from supervisor to student, hence methodological change may be slower."
For this reason, Berner said, he was not really surprised by the sampling results he and Amrhein got from the 2020 papers in the Journal of Evolutionary Biology. "Significance testing is still everywhere; in research articles, in presentations given by researchers," he noted."It was actually my experience and frustration as a reviewer of journal manuscripts that eventually motivated me to write this article."
Shifting the paradigm
Berner and Amrhein suggest ways that scientists can participate in the paradigm shift and improve scientific accuracy, using a more realistic form of statistical inference.
"Established researchers often have a hard time changing their statistical practice," Berner said. "Hence it is crucial that young researchers learn the problems inherent in significance testing. Teaching statistics plays a key role. In addition I think journal editors have a great potential for contributing to the paradigm shift, as they can directly influence the reporting culture within the scientific literature."
For his part, Berner said, "I will continue teaching estimation statistics to biology students and discussing these topics with the students and colleagues I interact with. My co-author, Valentin Amrhein, also an evolutionary biologist, wrote and continues to write papers and commentaries on statistics reform, together with researchers from many other disciplines. This is really a problem concerning many, if not most, of the quantitative sciences."
The reason that change is so difficult for many scientists. Berner emphasized, is "the shift to estimation statistics is primarily a change in mindset. Usually no new techniques are required. The shift will facilitate the way we present and understand scientific results, and it will make communication about science more trustworthy."
The article concludes there are many options "for effectively describing and communicating effect estimates and their uncertainty," the authors write. "Our task for the future is to exploit and to teach these options creatively, keeping in mind that all approaches have their strengths and weaknesses and answer slightly different questions, and that probably none of them is universally applicable or necessarily superior.
"What we hope to have made clear with this note, however, is that we can safely give up null hypothesis significance testing and the reporting of ‘statistical significance,’'' they add. "Doing so will help overcome problems with which science has struggled for decades."
-----
D. Berner & V. Amrhein, Why and how we should join the shift from significance testing to estimation, Journal of Evolutionary Biology, May 18, 2022