COMPARING AN OBSERVED PERCENTAGE TO A THEORETICAL PERCENTAGE: THE BINOMIAL TEST

Comparing an observed percentage to a theoretical percentage: the binomial test

We went in the field and collected data about the ratio of individuals expressing a certain trait (i.e. we only have 2 categories: expressed trait or not) in a subpopulation, and decide to compare it to what we would have expected based on what we know in a larger population. Sounds like a reasonable request. Let’s use the binomial test for that then. This test (unlike the chi square test) works with small sample sizes, even if in this case you should not expect miracles: it would be able to detect only huge differences.

For example, we went in the field and sampled students. (Yes, students make perfect experiment guinea pigs… Before the human resources contact me, I would just like to point out that the previous sentence was a joke!… But seriously… Fantastic guinea pigs…). We were able to catch 10 students, and 3 of them were lefties. We know that, in theory, 10% of humans are lefties. Is the proportion we observed in our “study” due to randomness in the sampling, or are we facing the birth of a new lefties’ nation?!? The binomial test is performed with the function “binom.test()” (damn, this is really well-thought once again!). We need to enter as arguments the number of observed individuals in the category we were interested in (here 3), our total sample size (here 10), and the proportion of said individual expected in a theoretical population:

binom.test(3,10,p=0.1)

        Exact binomial test

data:  3 and 10
number of successes = 3, number of trials = 10, p-value = 0.07019
alternative hypothesis: true probability of success is not equal to 0.1
95 percent confidence interval:
 0.06673951 0.65245285
sample estimates:
probability of success 
                   0.3 

With a probability of 0.07, we cannot reject the null hypothesis “the real proportion of lefties in our dataset we observed is equal to 0.1”. However, our test lets us know that the true proportion of lefties in our student sample is likely to be contained between 0.067 (6.7%) and 0.652  (65.2%)… This is saying a lot about what we know. We need more data! Sample all the students!!!!

Let’s say that we had sampled all the students on campus (19644), and we found that 5723 of them are lefties.

binom.test(5723,19644,p=0.1)

        Exact binomial test

data:  5723 and 19644
number of successes = 5723, number of trials = 19644, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.1
95 percent confidence interval:
 0.2849875 0.2977458
sample estimates:
probability of success 
             0.2913358 

We would now be able to affirm that the proportion of lefties in our sample is significantly (p<0.0000000000000002) different from 0.1. And we are pretty certain that the true proportion of lefties in our sample is comprised between 0.285 and 0.298. I guess this leaves us no option but to get used to weirdly shaped scissors.

INTRODUCTION

No, don't run away! It will be fine. Stats are cool.

ANOVA

Comparing the mean of more than two samples

FISHER’S EXACT TEST

Comparing several observed distribution

STUDENT’S T-TESTS

Comparing the mean of two samples

KRUSKAL-WALLIS RANK SUM TEST

Comparing more than two samples with a non-parametric test

CORRELATION AND REGRESSION

Correlation, regression and GLM!

WILCOXON TESTS

Comparing two samples with a non-parametric test

CHI SQUARE TEST

*cue "Ride of the Valkyries"

CONCLUSION

After this dreadful interlude, let's make some art!