Thursday 16 May 2024

Testing whether data follow a uniform distribution

Someone asked me how to test whether a data variable, which has values ranging from 1-1000, follows a uniform distribution.

 Getting some inspiration from Stackexchange, I realised that a Kolmogorov-Smirnov test can be used.

 First we can generate one million random numbers from a uniform distribution that ranges from 1-1000:

> y <- runif(1000000,1,1000)

Let's plot a histogram and check their median:

> hist(y, col="blue")


 

 

 

 

 

 

 

 

 

> median(y) 

[1] 500.1832

It is near 500, as we would expect.

 

Then enter the data that we want to compare to this distribution:

> x <- c(200,100,53,99,77,88,32)
 
Then use a Kolmogorov-Smirnov test:
> ks.test(x, y)
 
    Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.80089, p-value = 0.0002518
alternative hypothesis: two-sided
 
The test statistic is 0.80089, and the P-value is 0.002518.
 
The null hypothesis is that the data come from a uniform distribution from 1-1000; the alternative hypothesis is that the data do not.
 
Here the P-value is 0.002518, which indicates strong evidence against the null hypothesis, suggesting that we should reject the null hypothesis in favour of the alternative hypothesis.
 
In other words, we reject the null hypothesis that the 'x' come from a uniform distribution ranging from 1-1000, in favour of the alternative hypothesis (that 'x' does not come from such a distribution).


 
 
 
 
 

 

 

 

 

 

 


No comments: