library(tidyverse)
library(lmSupport)
library(psych)
library(car)
library(kableExtra) # for tables!
options(knitr.kable.NA = '')

1

n1 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/pick-up.csv", stringsAsFactors = T)

# dummy coding

n1 <- n1 %>% 
  mutate(att = ifelse(attract == "not attractive", 0, 1),
         app = ifelse(approach == "casual", 0, 1))

# checking (optional but probably wise)
table(n1$att, n1$app)
table(n1$attract, n1$att)
table(n1$approach, n1$app)

# getting some means
aggregate(time ~ attract * approach, n1, mean)

# means again
aggregate(time ~ attract * approach, n1, mean)
aggregate(time ~ attract, n1, mean)
aggregate(time ~ approach, n1, mean)

# modeling
model1 <- lm(time ~ att * app, n1)
summary(model1)
anova(model1)

Here are various means:

##          attract approach time
## 1     attractive   casual 41.6
## 2 not attractive   casual 18.8
## 3     attractive    humor 48.0
## 4 not attractive    humor 46.8

##   approach time
## 1   casual 30.2
## 2    humor 47.4

##          attract time
## 1     attractive 44.8
## 2 not attractive 32.8

There is a significant main effect of attractiveness, F(1, 16) = 10.1, p = .006; the attractive confederate generated longer conversations (M = 44.8) than the less-attractive confederate. There is a significant main effect of approach, F(1, 16) = 20.7, p = .0003; a humorous approach (M = 47.4) led to longer conversations than a casual approach (M = 30.2). There was also a significant interaction, F(1, 16) = 8.2, p = .01, complicating the interpretation of the main effects. The simple effect of attractiveness was significant when using a causal approach, t(16) = 4.27, p = .0006; being attractive is crucial for longer conversation when using a casual approach. The simple effect of approach was significant for the less-attractive confederate, t(16) = 5.24, p < .0001; using humor was far more effective than a casual approach in this case. The negative slope of the interaction contrast indicates that the effect of being attractive is much lower when using humor, and the effect of humor is much lower when one is attractive.

In sum: If one is attractive, it doesn’t much matter which approach one uses When trying to meet attractive others. But if one is less attractive, go with humor.

The ANOVA’s SSE is produced by the code below. SSE = 1143.

deviance(aov(time ~ attract*approach, n1))

R code below. SSE = 1143 (again, still).

n1 <- n1 %>% 
    mutate(
        
        con1 = ifelse(group == "control", 3/4, -1/4),
                 
        con2 = case_when(group == "control" ~ 0,
                                                  group == "only attractive" ~ -2/3,
                                                  group == "combo" ~ 1/3,
                                                  group == "only humor" ~ 1/3),
                 
        con3 = case_when(group == "control" ~ 0,
                                                  group == "only attractive" ~ 0,
                                                  group == "combo" ~ 1/2,
                                                  group == "only humor" ~ -1/2)
                 )

conModel <- lm(time ~ con1 + con2 + con3, n1)
deviance(conModel) #1143

They are the same because any three predictors - via dummy codes, contrast codes, etc. - will fully code group membership. The answer to the next question might provide an additional way to conceptualize why the SSE values are the same.
R code below. The predictions are the same no matter the model, and in every case the mean of each group is the predicted score for that group. The best-fitting model for a group of scores uses the mean of a group to predict those scores, and each of the models (dummy coded, ANOVA, contrast-coded, etc.) is doing just that.

# I've added the predictions to the data
n1 <- n1 %>% 
    mutate(model.b.predictions = predict(model1),
                 model.c.predictions = predict(aov(time ~ attract*approach, n1)),
                 model.d.predictions = predict(conModel))

2

n2 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/exercise2.csv", stringsAsFactors = TRUE)

# a
model2a <- lm(yendu ~ xage + zexer, n2)
coef(model2a)

# b
n2 <- n2 %>% 
  mutate(xage.c = xage - mean(xage),
         zexer.c = zexer - mean(zexer))

model2b <- lm(yendu ~ xage.c + zexer.c, n2)
coef(model2b)

# c
model2c <- lm(yendu ~ xage.c * zexer.c, n2)
coef(model2c)

# d
n2 <- n2 %>% 
  mutate(ageLow = xage - (mean(xage) - sd(xage)),
         exerLow = zexer - (mean(zexer) - sd(zexer)))

model2d <-lm(yendu ~ ageLow * exerLow, n2)
coef(model2d)

# e
n2 <- n2 %>% 
  mutate(ageHi = xage - (mean(xage) + sd(xage)),
         exerHi = zexer - (mean(zexer) + sd(zexer)))

model2e <-lm(yendu ~ ageHi * exerHi, n2)
coef(model2e)

# f (cell means; the extra stuff below is to create a table of means to "print")
aggregate(yendu ~ ageGroup * exerGroup, n2, mean) %>% 
  pivot_wider(id_cols = ageGroup,
              names_from = exerGroup,
              values_from = yendu) -> table2f
# what is pivot_wider? it takes data that is many rows and tranposes some of its
# information to columns

# g (marginal means for age groups)
aggregate(yendu ~ ageGroup, n2, mean) -> table2g

# h (marginal means for the exercise groups)
aggregate(yendu ~ exerGroup, n2, mean) -> table2h

# i
n2 <- n2 %>% 
  mutate(
    ageC = ifelse(ageGroup == "high", 1/2, -1/2),
    exerC = ifelse(exerGroup == "high", 1/2, -1/2)
  )

model2i <- lm(yendu ~ ageC + exerC, n2)
coef(model2i)

# j
model2j <- lm(yendu ~ ageC * exerC, n2)
coef(model2j)

# k
n2 <- n2 %>% 
  mutate(
    ageD = ifelse(ageGroup == "high", 1, 0),
    exerD = ifelse(exerGroup == "high", 1, 0)
  )

model2k <- lm(yendu ~ ageD * exerD, n2)
coef(model2k)

The slopes of age and exercise are -0.18 and 0.84, respectively.
The slopes are the same. Why? There’s no interaction, so the effect of each is with the other held constant at any value.
The slopes are: \(b_{age_c}\) = -0.18, \(b_{exercise_c}\) = 0.86, \(b_{interaction}\) = 0.0086. The slope of age is the predicted change in endurance for each one-unit increase in age, holding exercise constant at its mean. The slope of exercise is the predicted change in endurance for a one-unit increase in exercise, holding age constant at its mean. The slope of the interaction is the increase in either one of these simple slopes as the other variable increases by one.
The slopes are: \(b_{age_c}\) = -0.21, \(b_{exercise_c}\) = 0.77, \(b_{interaction}\) = 0.0086. The slope of age is the predicted change in endurance for each one-unit increase in age, holding exercise constant at 1 SD below its mean. The slope of exercise is the predicted change in endurance for a one-unit increase in exercise, holding age constant at 1 SD below its mean. The slope of the interaction is the increase in either one of these simple slopes as the other variable increases by one.
The slopes are: \(b_{age_c}\) = -0.14, \(b_{exercise_c}\) = 0.94, \(b_{interaction}\) = 0.0086. The slope of age is the predicted change in endurance for each one-unit increase in age, holding exercise constant at 1 SD above its mean. The slope of exercise is the predicted change in endurance for a one-unit increase in exercise, holding age constant at 1 SD above its mean. The slope of the interaction is the increase in either one of these simple slopes as the other variable increases by one.
age group

hi exercise

lo exercise

high

28.3

22.4

low

30.4

25.0
age group

M

high

25.4

low

27.7
exercise group

M

high

29.4

low

23.7
The slopes of the age and exercise contrast codes are -2.31 and 5.69, respectively. These are differences in marginal means (displayed in parts g and h, respectively).
(I’ve copied and pasted my answer from part c, with minor modifications. I can do this because it’s essentially the same model. The only things that have changed are what an increase of “one unit” means and what the “mean” of a variable indicates. That’s it.) The slopes are: \(b_{age}\) = -2.31, \(b_{exercise}\) = 5.69, \(b_{interaction}\) = 0.393. The slope of age is the predicted change in endurance for each one-unit increase in age (which means switching from young to old), holding exercise constant at its mean (which is for anyone). The slope of exercise is the predicted change in endurance for a one-unit increase in exercise (which enan switching from low to high exercise), holding age constant at its mean (which is for anyone). The slope of the interaction is the increase in either one of these simple slopes as the other variable increases by one (i.e., switches from the low to the high group).

Alternatively, the simple slopes can be interpreted as being about marginal means. The age slope is the difference between the means of the old (M = 25.4) and young (M = 27.7). The exercise slope is the difference between the means of the low exercisers (M = 23.7) and high exercisers (M = 29.4). The interaction slope is the difference of differences in cell means ((28.33 - 22.44) - (30.45 - 24.95)) = 0.39.
The slopes are: \(b_{age}\) = -2.51, \(b_{exercise}\) = 5.49, \(b_{interaction}\) = 0.393. These are now about cell means. The age slope is about the difference between young and old when exercise = 0 (i.e., for the low exercisers). The exercise slope is about the difference between high and low exercisers when age = 0 (i.e., for young participants). The interaction slope is about the same thing it was in part j.

age group	hi exercise	lo exercise
high	28.3	22.4
low	30.4	25.0

age group	M
high	25.4
low	27.7

exercise group	M
high	29.4
low	23.7

Problem Set 5 Answer Key

1

2