# Dunnett’s Multiple Test of The Difference Using ‘R’

I recently have encountered a statistical question simultaneously comparing multiple groups on the difference of certain characteristics. Normally, I use some statistical programs like Minitab, to run multiple t-test (e.g. Dunnett’s, Duncan etc), but I couldn’t find the way to compare the difference among groups using these programs straight.

Here is an example, there are 4 class rooms and each class room is divided by male group and female group. Then took a measurement of height of students. I want to test whether the difference in height between male and female students are statistically significant when all classes are compared to each other.

Class Sex Height [cm] A Male 175 A Male 169 A Male 177 A Female 162 A Female 167 A Female 155 B Male 183 B Male 171 B Male 179 B Female 171 B Female 166 B Female 164 C Male 162 C Male 175 C Male 176 C Female 161 C Female 165 C Female 152 D Male 175 D Male 178 D Male 172 D Female 161 D Female 167 D Female 153

Calculate the difference of average height for each group.

Avg Male - Avg Female A 173.6 - 161.3 = 12.3 B 177.6 - 167.0 = 10.6 C 171.0 - 159.3 = 11.7 D 175.0 - 160.3 = 14.7

By looking at the number, D has the largest difference between Male and Female in height and B has the smallest difference. Are they statistically significant? How do we test?

In “R”, there is a package for multcomp. In this package, there are several important features for this type of analysis.

Please read here for more details.

`>library(multcomp) ## install multcomp package`

Before start using “R”, let’s save the data above as “test.csv”. You need to add column name “Group”, “Sex” and “Measure” at the top of each column. Remember the directory you save the file (e.g. ./data).

Let’s set working directory and read the test.csv file.

>setwd("./data") >dat<-read.csv("./test.csv")

Then run ANOVA. Measure is height, Group is A~D and Sex is male or female.

Note that Group and Sex may interact each other, so you use * to run ANOVA. “-1” will remove intercept term.

`>mod <-aov(Measure~Group*Sex-1,data=dat)`

Print coefficients

`>coef(mod)`

GroupA GroupB GroupC GroupD

161.3333333 167.0000000 159.3333333 160.3333333

SexMale GroupB:SexMale GroupC:SexMale GroupD:SexMale

12.3333333 -1.6666667 -0.6666667 2.3333333

You may wonder what these numbers are….. But it is not hard.

First row is simply an average of female height for each Group.

SexMale (12.3333) is the difference height in Group A.

GroupB:SexMale (-1.6666) is the difference of the difference height B and A.

GroupC:SexMale (-0.6666) is the difference of the difference height C and A.

GroupD:SexMale (2.3333) is the difference of the difference height D and A.

As you can see, A is always used as reference to generate these numbers.

Now, to test difference in A and B is to test GroupB:SexMale=0

To test difference in A and C is to test GroupC:SexMale=0

To test difference in A and D is to test GroupD:SexMale=0

How about the rest of the comparisons?

To test difference in B and C is to test GroupB:SexMale -GroupC:SexMale=0

To test difference in B and Dis to test GroupB:SexMale -GroupD:SexMale=0

To test difference in C and D is to test GroupC:SexMale -GroupD:SexMale=0

The code for this is

>summary(glht(mod,linfct = c("GroupB:SexMale = 0", "GroupC:SexMale= 0", "GroupD:SexMale= 0", "GroupB:SexMale - GroupC:SexMale= 0", "GroupB:SexMale - GroupD:SexMale= 0", "GroupC:SexMale - GroupD:SexMale= 0")))

Simultaneous Tests for General Linear Hypotheses

Fit: aov(formula = Measure ~ Group * Sex – 1, data = dat)

Linear Hypotheses:

Estimate Std. Error t value Pr(>|t|)

GroupB:SexMale == 0 -1.6667 6.6792 -0.250 0.994

GroupC:SexMale == 0 -0.6667 6.6792 -0.100 1.000

GroupD:SexMale == 0 2.3333 6.6792 0.349 0.985

GroupB:SexMale – GroupC:SexMale == 0 -1.0000 6.6792 -0.150 0.999

GroupB:SexMale – GroupD:SexMale == 0 -4.0000 6.6792 -0.599 0.931

GroupC:SexMale – GroupD:SexMale == 0 -3.0000 6.6792 -0.449 0.969

(Adjusted p values reported — single-step method)

Well, in this example none of the comparison ended up in significant p>0.05 (numbers on the right).

I think standard deviation of height was probably too large.

Please test with your own data to see if it works.

## Recent Comments