I wanted to find out how to do a power calculation to estimate the number of mice needed per treatment group, for testing a drug for anthelmintic activity in mice.

Luckily, I found a nice paper by Marriott et al 2018, who estimated the number of mice or gerbils needed per treatment group
in such drug tests, to achieve a drug efficacy of 70% or higher, with a
statistical power of >75% and an alpha set at 0.05.

Their methods say:

*'Power analysis was undertaken using sample means and standard deviations of untreated/vehicle control gerbil or SCID mouse worm burdens combined from 2-3 independent infection or implantation experiments. With the assumption of proportional variation, sample size was calculated for drug efficacy effect sizes of 70% or 90% with a statistical power (1-Beta) of >75 < 90% with alpha set at 0.05 using a two-sample T test (Russ Lenth PiFace Applet)'.*

To do this, they used a great Java program called PiFace by the statistician Russ Lenth. This uses a two-sided t-test.

For example, for SCID mice, Marriott et al estimated based on some previous data that the worm burden (number of worms) per mouse is 15.3 (mean) with a standard deviation of 8.7. Achieving an efficacy of a drug of 70% would mean that the mean in the drug-treated mice would be (15.3-0.7*15.3)=4.59. That is, the difference between the mean worm burden in the control and drug-treated mice would be 15.3-4.59=10.71.

Marriott et al say in their methods that they used the 'assumption of proportional variation'. I think this means that they assumed the standard deviation in the drug-treated mice would be smaller than in the control mice, as the mean is smaller in the control mice. Under this assumption, if you get a standard deviation of 8.7 for a mean of 15.3 in the control mice, you would expect a standard deviation of (8.7/15.3)*4.59=2.61 for a mean of 4.59 in the drug-treated mice.

We can put the numbers sigma1=8.7, sigma2=2.6, 'True difference of means=10', into the PiFace Java program, and power=0.75, and choose 'two-sided test' for a two-sided t-test, and we estimate that we need a group size of 8, ie. 8 control mice and 8 drug-treated mice. Note that the picture shows the power to be 0.782, as if we put in power=0.75, it calculates the minimum sample sizes to give that power and then their corresponding power.

Note also that here we set the 'True difference of means=10', which was rounding down the difference of 10.71 which I calculated above. I find when I set this 'True difference of means' to 10, I get the value of 8 for the sample size, which is what Marriott et al report in their Table 4 of their paper. When I set the 'True difference of means' to 10.71, I get a slightly different sample size. I think they must have rounded down 10.71 to 10, to do this calculation (?).

By the way, I love the warning given by Russ Lenth in the instructions for his Java program PiFace:

*"Folks, just because you can plug numbers into a program doesn’t change the fact that if you don’t know what you’re doing, you’re almost guaranteed to get meaningless results – if not dangerously misleading ones. Statistics really is like rocket science; it isn’t easy, even to us who have studied it for a long time."***GPower software**

My colleague Maria Duque also told me about another free software called GPower.

GPower is also very easy to use, and as well as power calculations for t-tests, it can perform power calculations for Mann-Whitney tests, chi-squared tests, ANOVA, etc.

There is a nice talk about GPower with many examples, available here.

**Acknowledgements**

Thank you to Russ Lenth for making his lovely Java program PiFace available. Thank you also to my colleague Maria Duque for telling me about GPower.