I wanted to find out how to do a power calculation to estimate the number of mice needed per treatment group, for testing a drug for anthelmintic activity in mice.
Luckily, I found a nice paper by Marriott et al 2018, who estimated the number of mice or gerbils needed per treatment group in such drug tests, to achieve a drug efficacy of 70% or higher, with a statistical power of >75% and an alpha set at 0.05.
Their methods say:
'Power analysis was undertaken using sample means and standard deviations of untreated/vehicle control gerbil or SCID mouse worm burdens combined from 2-3 independent infection or implantation experiments. With the assumption of proportional variation, sample size was calculated for drug efficacy effect sizes of 70% or 90% with a statistical power (1-Beta) of >75 < 90% with alpha set at 0.05 using a two-sample T test (Russ Lenth PiFace Applet)'.
To do this, they used a great Java program called PiFace by the statistician Russ Lenth. This uses a two-sided t-test.
For example, for SCID mice, Marriott et al estimated based on some previous data that the worm burden (number of worms) per mouse is 15.3 (mean) with a standard deviation of 8.7. Achieving an efficacy of a drug of 70% would mean that the mean in the drug-treated mice would be (15.3-0.7*15.3)=4.59. That is, the difference between the mean worm burden in the control and drug-treated mice would be 15.3-4.59=10.71.
Marriott et al say in their methods that they used the 'assumption of proportional variation'. I think this means that they assumed the standard deviation in the drug-treated mice would be smaller than in the control mice, as the mean is smaller in the control mice. Under this assumption, if you get a standard deviation of 8.7 for a mean of 15.3 in the control mice, you would expect a standard deviation of (8.7/15.3)*4.59=2.61 for a mean of 4.59 in the drug-treated mice.
We can put the numbers sigma1=8.7, sigma2=2.6, 'True difference of means=10', into the PiFace Java program, and power=0.75, and choose 'two-sided test' for a two-sided t-test, and we estimate that we need a group size of 8, ie. 8 control mice and 8 drug-treated mice. Note that the picture shows the power to be 0.782, as if we put in power=0.75, it calculates the minimum sample sizes to give that power and then their corresponding power.
Note also that here we set the 'True difference of means=10', which was rounding down the difference of 10.71 which I calculated above. I find when I set this 'True difference of means' to 10, I get the value of 8 for the sample size, which is what Marriott et al report in their Table 4 of their paper. When I set the 'True difference of means' to 10.71, I get a slightly different sample size. I think they must have rounded down 10.71 to 10, to do this calculation (?).
By the way, I love the warning given by Russ Lenth in the instructions for his Java program PiFace: "Folks, just because you can plug numbers into a program doesn’t change the fact that if you don’t know what you’re doing, you’re almost guaranteed to get meaningless results – if not dangerously misleading ones. Statistics really is like rocket science; it isn’t easy, even to us who have studied it for a long time."
My colleague Maria Duque also told me about another free software called GPower.
GPower is also very easy to use, and as well as power calculations for t-tests, it can perform power calculations for Mann-Whitney tests, chi-squared tests, ANOVA, etc.
There is a nice talk about GPower with many examples, available here.
Thank you to Russ Lenth for making his lovely Java program PiFace available. Thank you also to my colleague Maria Duque for telling me about GPower.