Monday, 29 October 2018

Phred quality scores

Phred quality scores are used as a measure of quality for DNA sequencing base quality.
I always forget how they are calculated.

From wikipedia, the Phred base quality score for a particular base is:
Q = -10 log_10 (P)
where P = the probability that a particular base was called incorrectly.

Working out the probability a particular base was called incorrectly from the Phred score?
If you have Q, you can work out P as:
P = 10^(-Q /10)
So a base quality score of 20 means P = 10^(-2) = 0.010.
A base quality score of 25 means P = 0.003 approx.
A base quality score of 30 means P = 0.001.

Working out the Phred quality score threshold needed to achieve a particular threshold of P
If we want to use a threshold of P = 0.05, then we would need to use a threshold of Q of:
Q = -10 log_10(0.05) = 13.0103.
Working back to get P gives:
P = 10^(-13.0103/10) = 0.05.

Likewise, for a threshold of P = 0.005:
Q = -10 log_10(0.005) = 23.0103.
Working back to get P gives:
P = 10^(-23.0103/10) =  0.005.



No comments: