library(tidyverse)
library(rockchalk)
library(pwr)
library(psych)
library(car)

1

n1 <- read.csv("https://whlevine.hosted.uark.edu/psyc5133/partials.csv")

# a
var_y <- var(n1$y)

# b
n1model <- lm(y ~ x, n1)

# saving residuals in the data
n1 <- n1 %>% 
    mutate(e = resid(n1model))

# storing stuff for Markdown purposes
n1storage <- summary(n1model, t = F)
R2 <- n1storage$r.squared                # R-squared
SE_e <- n1storage$sigma                  # SE of the residuals
MSE <- anova(n1model)$`Mean Sq`[2] # MSE

# c
var_e <- var(n1$e)

# d
ratio_of_variances <- var_e / var_y 
one_minus_R2 <- 1 - R2

# e
resid_y_w <- resid(lm(y ~ w, n1))
resid_x_w <- resid(lm(x ~ w, n1))
r_y_w_with_x_w <- cor(resid_y_w, resid_x_w)

# f
pr_xy_w <- partial.r(data = n1,
                                         x = c("y", "x"),
                                         y = "w")[2]      # gets the second value from the output
  1. The variance of y is 8320.482.

  2. \(R^2\) = 0.291, \(s_{y.x}\) = 77.195, and \(MSE\) = 5959.092. \(MSE\) is the square of \(s_{y.x}\).

  3. The variance of the residuals is 5898.899.

  4. The ratio of the variance of the residuals to the variance of y is 0.709, which is equal to \(1 - R^2\) = 1 - 0.291.

  5. The correlation between the residuals of y with w partialed out and the residuals of x with w partialed out is 0.419.

  6. The partial correlation between the x and y controlling for w is 0.419, the same as the result in part e.

2

# storing various things in variables (like params2c and sr2) isn't necessary; I do it because I'm creating this answer key using HTML

n2 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/ps1.csv")

# a
n2 <- n2 %>% 
    mutate(e_X1.X2 = resid(lm(x1 ~ x2, n2)))

# b
n2 <- n2 %>% 
    mutate(e_X2.X1 = resid(lm(x2 ~ x1, n2)))

# c
coef(lm(y ~ e_X1.X2, n2)) -> params2c

# d
coef(lm(y ~ e_X2.X1, n2)) -> params2d

# e
coef(lm(y ~ x1 + x2, n2)) -> params2e

# g (dunno why I separated this from part k)
n2 <- n2 %>% 
    mutate(z.y = scale(y),
                 z.x1 = scale(x1),
                 z.x2 = scale(x2))

# h 
sr2 <- getDeltaRsquare(lm(y ~ x1 + x2, n2))


# i
cor(n2$e_X1.X2, n2$y)^2 -> squared_r_part_i

# j
cor(n2$e_X2.X1, n2$y)^2 -> squared_r_part_j

# k
coef(lm(z.y ~ z.x1 + z.x2, n2))[2:3] -> std_slopes # the 2nd and 3rd coefficients are
                                                   # are slopes; the 1st is the intercept

# l
R2model <- summary(lm(y ~ x1 + x2, n2))$r.squared
sr2sum <-  sum(sr2)
region_b <- R2model - sr2sum
  1. See above.

  2. See above.

  3. The intercept is 41.163 and the slope is 0.425.

  4. The intercept is 41.163 and the slope is 0.53.

  5. The intercept is 41.163 and the slopes are 0.425, 0.53.

  6. Everything is the same!

  7. See above.

  8. The \(sr^2\) values are 0.046 for X1 and 0.06 for X2. By a small amount, X2 appears to be a more-important predictor.

  9. The squared correlation of Y and the residuals of X1 regressed on X2 is 0.046. This is the same as the \(sr^2\) value for X1 in part h.

  10. The squared correlation of Y and the residuals of X1 regressed on X2 is 0.06. This is the same as the \(sr^2\) value for X2 in part h.

  11. The standardized slopes of X1 and X2 are 0.214 and 0.247, respectively. Just as in part h, X2 appears to be a more-important predictor by a small amount.

  12. \(R^2\) for the model is 0.117. Subtracting the \(sr^2\) values from this results in 0.011 being attributed to the model.

3

n3 <- read.csv("http://whlevine.hosted.uark.edu/psyc5143/mireault.csv")

# a
model3a <- lm(depresst ~ pvloss, n3)
summary(model3a)

# b
model3b <- lm(depresst ~ pvloss + pvtot, n3)
summary(model3b)
vif(model3b)
cor(n3$pvtot, n3$pvloss)

# c
model3c <- lm(depresst ~ pvloss + supptotl, n3)
summary(model3c)
getDeltaRsquare(model3c)

# d
cor3d <- cor(n3$pvloss, n3$supptotl)

# e
model3d <- lm(depresst ~ pvloss + supptotl + ageatlos, n3)
summary(model3d)
  1. Perceived vulnerability to loss is a positive and significant predictor of depression, b = 0.61, SE = 0.07, t(373) = 8.7, R2 = .169, p < .001.

  2. Now the value of R2 is .17, barely different from the model without pvtot. Worse, the SE for perceived vulnerability to loss went up from 0.07 to 0.12, suggesting a less-useful estimate of this parameter. With the larger SE, power is reduced (though this is moot in this case because the effect is still significant). The issue is that pvloss and pvtot are closely related, with a correlation of .79, which translates to a VIF of 2.7, high enough to take notice if not to worry. All said, including pvtot in the model doesn’t do much to R2, reduces our trust in the estimate of pvloss’s parameter, and wastes a df.

  3. Perceived vulnerability to loss and a person’s level of social support together explain a significant proportion of variability (R2 = .23) in depression, F(2, 372) = 55.3, p < .001. Controlling for level of social support, perceived vulnerabilty to loss is positively and significantly related to depression, b = 0.59, t(372) = 8.7, p < .001, sr2 = .157. Controlling for perceived vulnerability to loss, level of social support is negatively and significantly related to depression, b = -0.19, t(372) = -5.39, p < .001, sr2 = .06.

  4. The slope of pvloss does change slightly from part a to part c. This is because it’s slightly correlated (r = -0.044) with the other predictor (supptotl) in part c, so its slope is no longer “purely” about perceived vulnerability to loss; it’s about partialed perceived vulnerability to loss.

  5. The value of R2 for the whole model is .244 (up slightly from .229 for the model in part c) and this is significantly different from 0, F(3, 131) = 14.1, p < .001. Perceived vulnerability to loss is a significant positive predictor of depression, controlling for other predictors, b = 0.615, t(131) = 5.86, p < .001, sr2 = .20. Social support is a significant negative predictor of depression, controlling for other predictors, b = -0.164, t(131) = -3.09, p = .002, sr2 = .055. Age at loss is not significantly related to depression after controlling for other predictors, b = -0.107, t(131) = -0.68, p = .50. (Note that the df here is radically reduced; this is because of lots of missing data.)