Smarter hypothesis testing with statistics: How e-values can improve scientific research
Imagine
this: you repeatedly bet an amount of your choice on a fair coin, which has an
equal chance of landing heads or tails. If it lands on heads, you double your
money; if it lands on tails, you lose it. On average, you expect to break
even—it's a fair bet. You start with €1 and, each round, bet everything you
have. If you happen to get heads eight times in a row, you'll end up with €64
and might start to wonder: is this coin really fair? That is the concept behind
e-values: they help you assess whether an assumption still holds.
The
e-value (where 'e' stands for expected value) offers an alternative to the
p-value (with 'p' standing for probability), which researchers traditionally
use to test their hypotheses. The p-value comes with a major limitation: in
principle, you're only supposed to draw conclusions once you've collected all
your data. If you later decide to add more measurements, your statistical
analysis is no longer valid.
"A
lot of researchers still do it anyway, especially when their p-value is just
not quite small enough," says Lardy. This increases the risk of drawing
the wrong conclusion. E-values, on the other hand, remain statistically sound
even when you add extra data or adjust your analysis plan as you go along.
Lardy's
supervisor, Peter Grünwald, has been studying e-values for years. Grünwald
explains, "You can think of the e-value as the amount of money you would
earn from bets like the one in the example." The higher the e-value, the
stronger the evidence against your original assumption ("The coin is
fair'). That makes e-values especially useful in fields like medicine and
psychology, where researchers often face complex situations and need
flexibility in how they handle data.
A
general recipe for e-values can be very complex
By
now, there's a general method for calculating an optimal e-value. But in
practice, that method isn't always easy to apply. "That's why I looked
into how to design a good e-value for these kinds of complex problems,"
says Lardy. "What recipe should someone follow to end up with a meaningful
number at the end of their experiment?"
One
concrete example is testing whether a medicine works while taking into account
factors like the patient's age or gender. "In clinical trials, you
usually know exactly how the treatment is assigned—one half of the patients
receives the medicine, the other half a placebo. You can use that knowledge to
build an optimal e-value," Lardy explains.
Netflix
is already using it—now the universities need to catch up
For
now, p-values are still the norm in most university programs. Will we ever
fully switch to e-values? According to Grünwald, there are still a few hurdles
to overcome. "The theory is there, but we now need to develop the
practical tools. We've got beautiful formulas, but we still need good software
to go with them."
There's
also the matter of catching up: p-values have been standard practice for
decades. "A lot of people know about their limitations, but still stick to
what they're familiar with."
Even
so, Lardy sees signs of progress. Tech companies like Netflix are already using
e-values, for instance, to test whether users are more likely to click on a red
button or a gray one. Lardy and Grünwald hope that one day, e-values will make
their way into university textbooks—so that future students learn from the
start that they might be better off using e-values to test their hypotheses.
Lardy
will defend his thesis, titled "Optimal Test Statistics for
Anytime-Valid Hypothesis Tests," on 18 June in the Academy Building. His
supervisors are Grünwald and Wouter Koolen-Wijkstra.
For
more update:
Visit
Us 👇
Website
Link: https://statisticsaward.com/,
Nomination
Link: https://statisticsaward.com/award-nomination/,
Registration
Link: https://statisticsaward.com/award-registration/,
Comments
Post a Comment