---
title: "Problem Set 2 Answer Key"
output: html_document
---

```{r setup, include = FALSE}
options(digits = 3)
knitr::opts_chunk$set(echo = TRUE, fig.width = 4, fig.height = 4)
```

```{r, results='hide', message = FALSE}
library(tidyverse)
library(lmSupport)
library(psych)
library(car)
library(kableExtra)
options(knitr.kable.NA = '')
```

# 1

a)

Model C: predicted donation = overall mean

Model A: predicted donation = mean of group means + condition effect

$H_0: \mu_{legacy} = \mu_{control}$ or $\beta_1 = 0$

b)
```{r, message = FALSE}
n1 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/legacy.csv")
n1 %>% 
	group_by(group) %>% 
	summarise(M = mean(donation))
```

c) The slope should be the difference in means (i.e., 1.6) and the intercept should be the mean of the group means (i.e., 2.8).

d)
```{r}
n1 <- n1 %>% 
	mutate(con1 = ifelse(group == "control", -1/2, 1/2))

# Model C
coef(lm(donation ~ 1, n1))

# Model A
coef(lm(donation ~ con1, n1))
```

Match!

e) Match!

$\hat{donation}_{control} = 2.8 + 1.6\times(-0.5) = 2.0$

$\hat{donation}_{legacy} = 2.8 + 1.6\times(0.5) = 3.6$

f) The intercept is the mean of the group means (and the grand mean as well, but only because the group sizes are equal). The slope is the difference between the group means.

g)
```{r}
modelSummary(lm(donation ~ con1, n1)) -> n1summary
aggregate(donation ~ group, n1, mean) -> n1means
```
The legacy-primed group gave significantly greater donations (*M* = $`r n1means[2, 2]`) than the control group (*M* = $`r n1means[1, 2]`), *t*(`r n1summary$df[2]`) = `r n1summary$coefficients[2, 3]`, *p* = `r n1summary$coefficients[2, 4]`.

h)
```{r}
confint(lm(donation ~ con1, n1)) -> n1ci
```

The 95% CI for the slope (i.e., the group mean difference) is [`r n1ci[2, 1:2]`].

# 2

```{r}
n2 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/ps3.csv")
n2means <- n2 %>% group_by(group) %>% summarise(M = mean(Y))
n2meandiff <- 22.4 - 19.9
n2meanofmeans <- (22.4 + 19.9)/2

n2 <- n2 %>% 
	mutate(b = ifelse(group == "prime", 1, -1),
				 c = ifelse(group == "prime", 1/2, -1/2),
				 d = ifelse(group == "prime", 1, 0),
				 e = ifelse(group == "prime", -1, 0),
				 f = ifelse(group == "prime", 5, 3))

n2b <- coef(lm(Y ~ b, n2))
n2c <- coef(lm(Y ~ c, n2))
n2d <- coef(lm(Y ~ d, n2))
n2e <- coef(lm(Y ~ e, n2))
n2f <- coef(lm(Y ~ f, n2))
```

a) The group means are 22.4 and 19.9 for the "prime" and "control" groups, respectively. These differ by `r n2meandiff` and the mean of these two values is `r n2meanofmeans`.

b) The intercept and slope are `r n2b`, respectively. These are the mean of the group means and half the difference between the group means.

c) The intercept and slope are `r n2c`, respectively. These are the mean of the group means and the difference between the group means.

d) The intercept and slope are `r n2d`, respectively. These are the control group mean and the difference between the group means (prime - control).

e) The intercept and slope are `r n2e`, respectively. These are the control group mean and the difference between the group means (control - prime).

f) The intercept and slope are `r n2f`, respectively. The y-intercept isn't especially interpretable here, but the slope is half the difference between the group means (because the difference between the codes is 2, just like $\pm1$).

# 3

```{r, results='hide'}
n3 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/unequal.csv")

# a: group means, n, overall mean, mean of means
n3 %>% group_by(group) %>% 
	summarise(M = mean(Y),         # 20, 12
						n = length(Y)) %>%   # 9, 3
	ungroup()

# mean of means = 16

# grand/overall mean 
mean(n3$Y) # 18

# b
coef(lm(Y ~ X, n3)) # b0 = 16 (the mean of the means)

# c
9 * (20 - 18)^2 + 3 * (12 - 18)^2 # SS1 = 144
9 * (20 - 16)^2 + 3 * (12 - 16)^2 # SS2 = 192
anova(lm(Y ~ X, n3))              # SSR = 144
```

a) The groups means are 20 and 12. There are 9 and 3 observations in the two groups that go with these means, respectively. The overall (grand) mean is 18. The mean of the group means (20 and 12) is 16.

b) The intercept for this model is equal to the mean of the means and **not** the overall/grand mean.

c) SSR for the model in part b is 144, which is equal to "SS1". So, despite that the intercept in the augmented model is the mean of the group means, the improvement in that model is relative to a compact model that uses the overall/grand mean to make predictions.