---
title: "Problem Set 1 Answer Key"
output: html_document
---

```{r setup, include = FALSE}
options(digits = 3)
knitr::opts_chunk$set(echo = TRUE, fig.width = 4, fig.height = 4)
```

```{r, message = FALSE}
library(tidyverse)
library(lmSupport)
library(psych)
library(car)
options(knitr.kable.NA = '')
```

# 1

The relationship depicted below is non-monotonic. Darts performance gets better (scores get lower) with increasing BAC until BAC $\approx$ .06, at which point it gets worse with increasing BAC. 

```{r, results = 'hide', message = FALSE}
darts <- read_csv("https://whlevine.hosted.uark.edu/psyc5143/darts.csv")

ggplot(darts, aes(x = BAC, y = Darts)) +
	geom_point() +
	geom_smooth(method = 'loess', se = F)
```

# 2
```{r}
darts <- darts %>% 
	mutate(BAC.c = BAC - mean(BAC))

BAC.m <- mean(darts$BAC)

linear.model <- lm(Darts ~ BAC.c, darts)

coef(linear.model) -> linear.parameters
```

The y-intercept is `r linear.parameters[1]`, which is the predicted Cricket score for someone with a mean BAC of `r BAC.m`. The slope of the linear-only model is `r linear.parameters[2]`, which is the value by which Cricket scores are expected to increase per 1 percentage-point increase in BAC. To make this easier to interpret, we can divide the slope by 100, making it `r linear.parameters[2]/100`, which is the predicted increase in Cricket scores per .01 increase in BAC.

# 3
```{r}
modelSummary(linear.model)
```

*SSE* = 6265.1

# 4
```{r}
darts <- darts %>% 
	mutate(BAC.c.sq = BAC.c^2)

model.quad1 <- lm(Darts ~ BAC.c + BAC.c.sq, darts)
model.quad2 <- lm(Darts ~ BAC.c + I(BAC.c^2), darts)
coef(model.quad1)
coef(model.quad2) # same results! w00t!

coef(model.quad1) -> quadratic.parameters
```

The intercept of the quadratic model is `r quadratic.parameters[1]`, which is the predicted Cricket score for someone with a mean level of BAC. 

Why is the y-intercept notably different from the corresponding answer to #2. It's because the nature of the model - linear vs quadratic - make a big difference in what's expected to happen for various BAC levels. The graph below illustrates this nicely.
```{r}
# visualizing why the y-intercept is different for the linear and quadratic
# models
ggplot(darts, aes(x = BAC, y = Darts)) +
	geom_point() +
	geom_smooth(method = 'lm', formula = y ~ x, se = F) +
	geom_smooth(method = 'lm', formula = y ~ poly(x, 2), se = F) +
	geom_vline(xintercept = BAC.m)
```

The linear slope for the quadratic model is `r quadratic.parameters[2]`, which is the simple or point slope of the BAC-Darts relationship at the mean BAC value, illustrated by the graph below.
```{r}
ggplot(darts, aes(x = BAC, y = Darts)) +
	geom_point() +
	geom_smooth(method = 'lm', formula = y ~ poly(x, 2), se = F, col = "black") +
	geom_vline(xintercept = BAC.m, col = "black") +
	geom_abline(slope = quadratic.parameters[2], intercept = quadratic.parameters[1] + quadratic.parameters[2]*(-BAC.m), col = "red")
```

The quadratic slope for the quadratic model is `r quadratic.parameters[3]`, which is (half) the rate at which the linear slope increases for every 1 percentage point increase in BAC. It's a very-large number because the linear slope changes rapidly (from very negative until BAC $\approx$ .05 to very positive when BAC $\gt$ .10 or so) and because BAC is typically measured in hundreths of a percentage point.

# 5
```{r}
modelSummary(model.quad1, t = F)
```
*SSE* = 3454

# 6
```{r}
PRE <- (6265.1 - 3454) / 6265.1
Fquad <- PRE / ((1 - PRE)/97)
```

Based on the *SSE* values above, *F* = `r Fquad`, which matches the F-ratio for the quadratic term in the quadratic model.

# 7

The graph in #4 does this!

# 8

The linear model shows a significant positive linear relationship between BAC and Cricket score; as BAC goes up, so does predicted Cricket score (which means that performance gets worse) This is complicated by the nonlinear relationship. The quadratic model, which accommodates the nonlinear relationship, shows that the relationship between BAC and Cricket score gets more positive as BAC increases, although an inspection of the graph of this relationship shows that the linear relationship starts out negative (which is good if one wants to succeed at Cricket), but becomes positive after a certain BAC is reached (which is bad if one wants to succeed at Cricket).

# 9

a) 

The *y*-intercept of 5 is the predicted performance of someone with a mean level of stress.

The -0.4 linear slope of $stress_c$ indicates that for someone with a mean level of stress, performance is predicted to decline 0.4 points per unit of stress.

The -0.2 quadratic slope of $stress_c^2$ indicates that the linear slope gets 2 $\times$ 0.2 = 0.4 more negative per unit of increased stress.

b) 

5.2 (see below)

$\hat{performance} = 5 - 0.4 \times stress_c - 0.2 \times stress_c^2$

With stress = 3 (i.e., centered stress = -1)

$\hat{performance} = 5 - 0.4 \times (-1) - 0.2 \times (-1^2)$

$\hat{performance} = 5 + 0.4 - 0.2 \times 1$

$\hat{performance} = 5 + 0.4 - 0.2 = 5.2$

c) 

-0.8 (see below)

The linear slope at any given point is generated by

$slope = -0.4 - 0.4 \times stress_c$

So if stress = 5 (i.e., centered stress = +1)

$slope = -0.4 - 0.4 \times 1$

$slope = -0.4 - 0.4 = -0.8$

# Bonus?

To help visualize the answers to #9, see the graph below

```{r}
x <- -5:5  # some x values
dat <- data.frame(x, y = 5 - 0.4*x - 0.2*x^2) # a data frame
f <- function(x) 5 - 0.4*x - 0.2*x^2  # make the equation a function

# the plot
ggplot(dat, aes(x, y)) +
	xlab("centered stress") +
	stat_function(fun = f, col = "red") +
	geom_vline(xintercept = -1)              # predicted performance for stress.c = -1
	
ggplot(dat, aes(x, y)) +
	xlab("centered stress") +
	stat_function(fun = f, col = "red") +
	geom_vline(xintercept = 1) +
	geom_vline(xintercept = 0) +
	geom_abline(slope = -0.8, intercept = 5.2, col = "blue") # the tangent line at stress.c = +1 with a slope of -0.8

```