1

n2 <- read.csv("http://whlevine.hosted.uark.edu/psyc5143/ancova.csv")

# making Method a factor
n2 <- n2 %>% 
    mutate(Method = as.factor(Method))

# checking
contrasts(n2$Method) # dummy codes

##   1
## 0 0
## 1 1

# a

# I'm storing things like means and model summaries in objects for markdown
# purposes
n2Means <- n2 %>% group_by(Method) %>% summarise(M = mean(Posttest))

The means of the posttest scores for Methods 0 and 1 are 74.6667 and 89, respectively.

b

model_n2 <- lm(Posttest ~ Method, n2)
summary(lm(Posttest ~ Method, n2))

The two groups differ significantly, \(F(1, 28) = 12.9, p = .001\), with Method 1 scoring higher than Method 0 on the posttest.

# c

n2 <- n2 %>% 
    mutate(pretest.c = Pretest - mean(Pretest))

model_n2_ancova <- lm(Posttest ~ Method + pretest.c, n2)
modelSummary(model_n2_ancova, t = FALSE)

Controlling for pretest scores, there is (still) a significant difference between the two methods, \(F(1, 27) = 16.64, p < .001\). Notice the loss of 1 \(df\).
The y-intercept of 75.2 is the predicted posttest score for someone in Method 0 with a mean pretest score. The slope of 13.2 is the predicted increase in the posttest score for someone in Method 1 with a mean pretest score; that is, we predict a score of 88.4. Finally, the slope of pretest (0.49 or so) is the predicted increase per unit increase in posttest scores per unit increase in pretest scores, holding Method constant (i.e., for someone in either group; but the predicted increase is the same for both groups).

# adjusted means the easy way
library(effects)
effect("Method", model_n2_ancova)

The regression equation is

\(\hat{Y} = 75.2 + 13.2 \times Method + 0.47 \times pretest_c\)

Plugging in values of 0 (for Method 0) and 0 (for pretest.c) gives us

\(\hat{Y} = 75.2 + 13.2 \times 0 + 0.47 \times 0 = 75.2\)

Plugging in values of 1 (for Method 1) and 0 (for pretest.c) gives us

\(\hat{Y} = 75.2 + 13.2 \times 1 + 0.47 \times 0 = 88.4\)

I wound up describing these in my answer to part d.

# f
n2_int_model <- lm(Posttest ~ Method*pretest.c, n2)
summary(n2_int_model)

The equal-slopes assumption is in shaky condition. A test of the interaction of group and pretest is not significant, \(t(26) = 1.9, p = .07\), but it’s not negligible, either. See the graph below for a visualization of these slopes.

n2 %>% ggplot(aes(y = Posttest, x = Pretest, group = Method, color = Method)) +
    geom_point() +
    geom_smooth(se = F, method = lm) +
    theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

n2_covariate_DV <- lm(Pretest ~ Method, n2)
summary(n2_covariate_DV)
n2 %>% group_by(Method) %>% summarise(M = mean(Pretest))

The groups do not differ significantly with respect to the covariate, t(28) = 0.47, p = .64, with the groups means being 62.3 and 64.7.

2

n3 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/april13.csv")

# a
n3 %>% group_by(group) %>% summarise(M = mean(Y))
n3 %>% group_by(group) %>% summarise(M = mean(Z))
mean(n3$Z)
n3 %>% group_by(group) %>% summarise(M = mean(Zbad))
mean(n3$Zbad)

# b
n3b_model <- lm(Y ~ X1 + X2 + Z, n3)
coef(n3b_model)

# c
n3_model <- lm(Y ~ X1 + X2 + Zbad, n3)
coef(n3_model)

The group means for the outcome are 10, 20, and 30. (Not required, but useful info: The overall mean of Zbad = 50.3 or so. The Zbad group means are 60.9, 52.2, and 37.7. The latter are quite different, suggesting strongly that Zbad is not independent of the grouping variable. By contrast, the means for the groups for the Z covariate are 47.0, 46.9, and 49.4, not very different at all.)
See the code above.
See the code above.
The regression equation is

\(\hat{Y} = 6.5 + 19.3 \times X1 + 13.9 \times X2 + 0.27 \times Zbad\)

To get predictions for the groups, I’ll plug in \(X1 = -\frac{2}{3}\) and \(X2 = 0\) for A1, \(X1 = \frac{1}{3}\) and \(X2 = -\frac{1}{2}\) for A2, and \(X1 = \frac{1}{3}\) and \(X2 = \frac{1}{2}\) for A3, along with \(Zbad = 50.3\) and \(Z = 47.75\).

The predicted scores of A1, A2, and A3 using the overall mean of Zbad are, respectively:

\(\hat{Y} = 6.5 + 19.3 \times -\frac{2}{3} + 13.9 \times 0 + 0.27 \times 50.3\) = 7.2143

\(\hat{Y} = 6.5 + 19.3 \times \frac{1}{3} + 13.9 \times -\frac{1}{2} + 0.27 \times 50.3\) = 19.5643

\(\hat{Y} = 6.5 + 19.3 \times \frac{1}{3} + 13.9 \times \frac{1}{2} + 0.27 \times 50.3\) = 33.4643

The predicted scores of A1, A2, and A3 using the overall mean of Z are, respectively:

\(\hat{Y} = 8.9 + 14.7 \times -\frac{2}{3} + 9.4 \times 0 + 0.23 \times 47.75\) = 10.0825

\(\hat{Y} = 8.9 + 14.7 \times \frac{1}{3} + 9.4 \times -\frac{1}{2} + 0.23 \times 47.75\) = 20.0825

\(\hat{Y} = 8.9 + 14.7 \times \frac{1}{3} + 9.4 \times \frac{1}{2} + 0.23 \times 47.75\) = 29.4825

(Note that these are somewhat rounded.)

The predicted scores of A1, A2, and A3 using the group means of Zbad are, respectively:

\(\hat{Y} = 6.5 + 19.3 \times -\frac{2}{3} + 13.9 \times 0 + 0.27 \times 60.9\) = 10.0763

\(\hat{Y} = 6.5 + 19.3 \times \frac{1}{3} + 13.9 \times -\frac{1}{2} + 0.27 \times 52.2\) = 20.0773

\(\hat{Y} = 6.5 + 19.3 \times \frac{1}{3} + 13.9 \times \frac{1}{2} + 0.27 \times 37.7\) = 30.0623

The predicted scores from part e are closer to the actual means. When groups differ substantially with respect to the covariate, using the grand mean of the covariate to generate predicted scores (“adjusted means”) will lead to predictions that are about an unusual (i.e., different from the mean) individual in each group. It’s not a good idea to draw conclusions about differences between groups based on predicted scores that correspond to unusual cases!

Problem Set 7 Answer Key

1

b

2