Comparing more than two samples: Kruskal-Wallis rank sum test
“Blah blah blah Flo! You talk nice and pretty, but our data are crappy: we have less than 30 records per sample, and it doesn’t look normal! What should we do then, Big Cheese?!”
The Kruskal-Wallis rank sum test is equivalent to a single factor ANOVA, but is used when you have “crappy” data (if you have few individuals in a least one of your samples and your data are not normally distributed). It is an extension of the Mann-Whitney-Wilcoxon test to several samples. We can use the same type of formulation as with the “aov()” function:
kruskal.test(rating~actor,data=film)
Kruskal-Wallis rank sum test
data: rating by actor
Kruskal-Wallis chi-squared = 8.7367, df = 3, p-value = 0.033
Or, if each sample is entered as a separate vector of values, we can coerce them into a list:
Creating a rating vector for each actor through a logical test (check Chapter 5 for more details)
SS=film$rating[film$actor=="SS"]
AS=film$rating[film$actor=="AS"]
BW=film$rating[film$actor=="BW"]
JCVD=film$rating[film$actor=="JCVD"]
Test:
kruskal.test(list(SS,AS,BW,JCVD))
Kruskal-Wallis rank sum test
data: list(SS, AS, BW, JCVD)
Kruskal-Wallis chi-squared = 8.7367, df = 3, p-value = 0.033
If the test detects a statistically significant imbalance in your dataset (i.e. at least one of the group is different from the others), you can proceed to detect which group(s) is(are) responsible by comparing two-by-two each group with a Wilcoxon rank-sum test for each pair. If you decide to proceed this way, remember that you will have to account for the fact that you are multiplying tests.
Let’s assume that you have 3 groups (A,B and C), and that you detected that at least one of the groups is different from the others, you now need to compare groups between them. If you realize 3 tests (comparing pairs A to B, A to C, and B to C) each with a probability of 5% of being wrong, your total probability of having at least one of them to be wrong is now equal to 1 minus the probability of all of them being right=1-0.95*0.95*0.95=0.14!
In order to make sure that, overall, we are not taking the risk of a false positive, we need to lower the individual probability threshold used to consider each test to be statistically significant. This is called the Bonferroni correction. We need to divide our original significance level by the total number of comparisons done. In the case of 3 groups, that’s 3 comparisons (A-B, A-C, B-C), we will divide 0.05 by 3. If we have 4 groups, we know have 6 comparisons to do (A-B, A-C, A-D, B-C, B-D, C-D) and we would now have to use a significance level of 0.05/6=0.0083… As you can see, as the number of groups increases, the adjustment made becomes incredibly stricter.
I’m going to directly extract for each test the corresponding final p.value:
wilcox.test(SS, AS)$p.value
[1] 0.6453768
wilcox.test(SS, BW)$p.value
[1] 0.8883587
wilcox.test(SS, JCVD)$p.value
[1] 0.02739613
wilcox.test(AS, BW)$p.value
[1] 0.3356275
Warning message:
In wilcox.test.default(AS, BW) : cannot compute exact p-value with ties
wilcox.test(AS, JCVD)$p.value
[1] 0.009331355
Warning message:
In wilcox.test.default(AS, JCVD) : cannot compute exact p-value with ties
wilcox.test(BW, JCVD)$p.value
[1] 0.06252571
Remember how our Tukey’s range test identified ‘AS’ and ‘JCVD’ as being statistically different? Here, we are close to be able to make this conclusion, but we can’t (0.0093>0.05/6). This might sound disappointing, but if we don’t apply the Bonferroni correction, we would also falsely conclude that SS and JCVD are different. That’s the price to pay to not give false positive.
INTRODUCTION
No, don't run away! It will be fine. Stats are cool.
ANOVA
Comparing the mean of more than two samples
FISHER’S EXACT TEST
Comparing several observed distribution
STUDENT’S T-TESTS
Comparing the mean of two samples
BINOMIAL TEST
Comparing observed percentages to theoretical probabilities
CORRELATION AND REGRESSION
Correlation, regression and GLM!
WILCOXON TESTS
Comparing two samples with a non-parametric test
CHI SQUARE TEST
*cue "Ride of the Valkyries"
CONCLUSION
After this dreadful interlude, let's make some art!