COMPARING MORE THAN TWO SAMPLES: KRUSKAL-WALLIS RANK SUM TEST

Comparing more than two samples: Kruskal-Wallis rank sum test

“Blah blah blah Flo! You talk nice and pretty, but our data are crappy: we have less than 30 records per sample, and it doesn’t look normal! What should we do then, Big Cheese?!”

Deja-vu…

The Kruskal-Wallis rank sum test is equivalent to a single factor ANOVA, but is used when you have “crappy” data (if you have few individuals in a least one of your samples and your data are not normally distributed). It is an extension of the Mann-Whitney-Wilcoxon test to several samples. We can use the same type of formulation as with the “aov()” function:

kruskal.test(rating~actor,data=film)

        Kruskal-Wallis rank sum test

data:  rating by actor
Kruskal-Wallis chi-squared = 8.7367, df = 3, p-value = 0.033

Or, if each sample is entered as a separate vector of values, we can coerce them into a list:

Creating a rating vector for each actor through a logical test (check Chapter 5 for more details)

SS=film$rating[film$actor=="SS"]
AS=film$rating[film$actor=="AS"]
BW=film$rating[film$actor=="BW"]
JCVD=film$rating[film$actor=="JCVD"]

Test:

kruskal.test(list(SS,AS,BW,JCVD))

        Kruskal-Wallis rank sum test

data:  list(SS, AS, BW, JCVD)
Kruskal-Wallis chi-squared = 8.7367, df = 3, p-value = 0.033

If the test detects a statistically significant imbalance in your dataset (i.e. at least one of the group is different from the others), you can proceed to detect which group(s) is(are) responsible by comparing two-by-two each group with a Wilcoxon rank-sum test for each pair. If you decide to proceed this way, remember that you will have to account for the fact that you are multiplying tests.

Let’s assume that you have 3 groups (A,B and C), and that you detected that at least one of the groups is different from the others, you now need to compare groups between them. If you realize 3 tests (comparing pairs A to B, A to C, and B to C) each with a probability of 5% of being wrong, your total probability of having at least one of them to be wrong is now equal to 1 minus the probability of all of them being right=1-0.95*0.95*0.95=0.14!

In order to make sure that, overall, we are not taking the risk of a false positive, we need to lower the individual probability threshold used to consider each test to be statistically significant. This is called the Bonferroni correction. We need to divide our original significance level by the total number of comparisons done. In the case of 3 groups, that’s 3 comparisons (A-B, A-C, B-C), we will divide 0.05 by 3. If we have 4 groups, we know have 6 comparisons to do (A-B, A-C, A-D, B-C, B-D, C-D) and we would now have to use a significance level of 0.05/6=0.0083… As you can see, as the number of groups increases, the adjustment made becomes incredibly stricter.

I’m going to directly extract for each test the corresponding final p.value:

wilcox.test(SS, AS)$p.value     
[1] 0.6453768

wilcox.test(SS, BW)$p.value     
[1] 0.8883587

wilcox.test(SS, JCVD)$p.value   
[1] 0.02739613

wilcox.test(AS, BW)$p.value     
[1] 0.3356275
Warning message:
In wilcox.test.default(AS, BW) : cannot compute exact p-value with ties

wilcox.test(AS, JCVD)$p.value   
[1] 0.009331355
Warning message:
In wilcox.test.default(AS, JCVD) : cannot compute exact p-value with ties

wilcox.test(BW, JCVD)$p.value
[1] 0.06252571

Remember how our Tukey’s range test identified ‘AS’ and ‘JCVD’ as being statistically different? Here, we are close to be able to make this conclusion, but we can’t (0.0093>0.05/6). This might sound disappointing, but if we don’t apply the Bonferroni correction, we would also falsely conclude that SS and JCVD are different. That’s the price to pay to not give false positive.

INTRODUCTION

No, don't run away! It will be fine. Stats are cool.

ANOVA

Comparing the mean of more than two samples

FISHER’S EXACT TEST

Comparing several observed distribution

STUDENT’S T-TESTS

Comparing the mean of two samples

BINOMIAL TEST

Comparing observed percentages to theoretical probabilities

CORRELATION AND REGRESSION

Correlation, regression and GLM!

WILCOXON TESTS

Comparing two samples with a non-parametric test

CHI SQUARE TEST

*cue "Ride of the Valkyries"

CONCLUSION

After this dreadful interlude, let's make some art!