---
title: "Problem Set 6 Answer Key"
output: html_document
---

```{r setup, results = 'hide', message = FALSE, include = FALSE}
options(digits = 4, scipen = 999)
knitr::opts_chunk$set(echo = TRUE, fig.width = 4, fig.height = 4)
library(tidyverse)
library(lmSupport)
library(psych)
library(car)
library(kableExtra)
library(pwr)
options(knitr.kable.NA = '')
```

# 1

a) Cell and marginal means are below.

```{r, message = FALSE}
d <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/avoidance.csv")

# better labels; this does some of the work for me for part f
d <- d %>% 
	mutate(areaF = ifelse(area == 1, "neutral", "A"),
				 delayF = case_when(delay == 1 ~ "50",
				 									 delay == 2 ~ "100",
				 									 delay == 3 ~ "150"))

# means
d %>% group_by(areaF, delayF) %>% summarize(M = mean(latency))
d %>% group_by(areaF) %>% summarise(M = mean(latency))
d %>% group_by(delayF) %>% summarise(M = mean(latency))
```

b) See the code below. Your answers will differ somewhat if you used different contrasts than I did for the delay factor.

```{r}
# contrast codes
d <- d %>% 
	mutate(areaC = ifelse(areaF == "neutral", 1/2, -1/2),
				 delayC1 = ifelse(delayF == "50", -2/3, 1/3),
				 delayC2 = case_when(delayF == "100" ~ 1/2,
				 										delayF == "150" ~ -1/2,
				 										TRUE ~ 0),
				 int1 = areaC*delayC1,
				 int2 = areaC*delayC2)
```

c) The intercept below is the mean of the cell means. The areaC slope is the difference between the Area A and the neutral area means. The delayC1 slope is the difference between the mean of the 100 & 150 msec conditions combined and the mean of the 50 msec condition. The delayC2 slope is the difference between the 100 and 150 msec condition means. The int1 slope is the difference 100/150 msec combined and 50 msec in Area A vs the same difference in the neutral area. The int2 slope is the difference between 100 and 150 msec in Area A vs the same difference in the neutral area.

```{r}
model <- lm(latency ~ areaC + delayC1 + delayC2 + int1 + int2, d)
coef(model)
```

d) Latencies are not significantly shorter when Area A has been lesioned, relative to the neutral area lesion, t(24) = 1.98, p = .06. Latencies are significantly longer in the 100 and 150 msec delay conditions than in the 50 msec condition, t(24) = 2.67, p = .01; this effect is significantly larger in Area A than in the neutral area, t(24) = 3.18, p = .004. There is no significant difference in latencies in the 100 and 150 msec conditions (p = .83), and this difference did not interact significantly with lesioned area (p = .83).

```{r, results='hide'}
summary(model)
```

e) 
```{r, results='hide'}
modelEffectSizes(model)
```

The SSRs for the predictors are above.

f) See the code below.

```{r, results='hide'}
d$areaF <- as.factor(d$areaF)
d$delayF <- as.factor(d$delayF)

summary(aov(latency ~ areaF*delayF, d))
```

g) For areaC, SSR = 100.83, which is the same as the Sum Sq value for the area factor in the ANOVA. The SSR values for delayC1 and delayC2 sum (183.75 + 1.25 = 185) to the Sum Sq value for the delay factor in the ANOVA. The SSR values for int1 and int2 sum (260.42 + 1.25 = 261.67) to the Sum Sq value for the interaction in the ANOVA.

h) The code below might be overboard, but I want to avoid problems with rounding. What I've done is extra SSR values and the effect-size measures and stashed them in a data frame. I also put SSE in a variable. The I calculated the effect size measures so that you can see them side-by-side. Yeah, it's overboard, but it keeps me sharp.

```{r, results='hide'}
# build a data frame by first pulling out the pieces of interest from the modelEffectSizes
d2 <- data.frame(modelEffectSizes(model)$Effects)
# and then get rid of the intercept row
d2 <- d2[-which(rownames(d2) == "(Intercept)"), ]

# put SSE & SST in variables
SSE <- modelEffectSizes(model)$SSE
SST <- modelEffectSizes(model)$SST

# calculate effect-sizes "by hand" and add them to the data
d2 <- d2 %>% 
	mutate(partialEta = SSR / (SSR + SSE),
				 deltaR = SSR / SST)

# show off
d2

```

```{r, echo=FALSE}
d2
```


i) Part j is answered here, too. See the code below.

```{r, results='hide'}
# dummy codes
d <- d %>% 
	mutate(areaD = ifelse(areaF == "neutral", 0, 1),
				 delayD1 = ifelse(delayF == "100", 1, 0),
				 delayD2 = ifelse(delayF == "150", 1, 0),
				 intD1 = areaD*delayD1,
				 intD2 = areaD*delayD2)

modelDummy <- lm(latency ~ areaD+ delayD1 + delayD2 + intD1 + intD2, d)
coef(modelDummy)
```

Now the parameter estimates are about cell means. The intercept is the mean of the double-reference group (neutral-50). The areaD slope (-12) is the difference between the means of neutral-50 (26) and Area A-50 (14). The delayD1 slope about the difference between the mean of neutral-50 (26) and neutral-100 (25); the delayD2 slope is about the difference between the mean of neutral-50 (26) and neutral-150 (25). The intD1 slope is the difference between difference in the neutral-50 mean (26) and the neutral-100 mean (25) - which is -1 - and the same difference in Area A (14 - 25 = -11); and the intD2 slope is the corresponding difference in the comparison of the 50 vs 150 msec conditions.