Power, sample size, effect size, and significance level are linked by a single relationship: pick any three and the fourth is determined.
For comparing two means with per group, the approximate power is
where is the true difference and is the within-group standard deviation. Bigger effects and bigger samples both raise power; bigger noise lowers it.
Underpowered studies are a notorious problem. They miss real effects and, when they do find significance, tend to overstate the effect size — a phenomenon known as the winner's curse.
In trading, this matters viscerally: if you're A/B testing two strategies that each have daily Sharpe of and you want to detect a true Sharpe difference of at power, you typically need a couple of years of data. Most "I just compared two backtests" exercises are dramatically underpowered.