library(tidyverse)
library(lmSupport)
library(psych)
library(car)
library(kableExtra) # for tables!
options(knitr.kable.NA = '')
n1 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/pick-up.csv", stringsAsFactors = T)
# dummy coding
n1 <- n1 %>%
mutate(att = ifelse(attract == "not attractive", 0, 1),
app = ifelse(approach == "casual", 0, 1))
# checking (optional but probably wise)
table(n1$att, n1$app)
table(n1$attract, n1$att)
table(n1$approach, n1$app)
# getting some means
aggregate(time ~ attract * approach, n1, mean)
# means again
aggregate(time ~ attract * approach, n1, mean)
aggregate(time ~ attract, n1, mean)
aggregate(time ~ approach, n1, mean)
# modeling
model1 <- lm(time ~ att * app, n1)
summary(model1)
anova(model1)
## attract approach time
## 1 attractive casual 41.6
## 2 not attractive casual 18.8
## 3 attractive humor 48.0
## 4 not attractive humor 46.8
## approach time
## 1 casual 30.2
## 2 humor 47.4
## attract time
## 1 attractive 44.8
## 2 not attractive 32.8
In sum: If one is attractive, it doesn’t much matter which approach one uses When trying to meet attractive others. But if one is less attractive, go with humor.
deviance(aov(time ~ attract*approach, n1))
n1 <- n1 %>%
mutate(
con1 = ifelse(group == "control", 3/4, -1/4),
con2 = case_when(group == "control" ~ 0,
group == "only attractive" ~ -2/3,
group == "combo" ~ 1/3,
group == "only humor" ~ 1/3),
con3 = case_when(group == "control" ~ 0,
group == "only attractive" ~ 0,
group == "combo" ~ 1/2,
group == "only humor" ~ -1/2)
)
conModel <- lm(time ~ con1 + con2 + con3, n1)
deviance(conModel) #1143
They are the same because any three predictors - via dummy codes, contrast codes, etc. - will fully code group membership. The answer to the next question might provide an additional way to conceptualize why the SSE values are the same.
R code below. The predictions are the same no matter the model, and in every case the mean of each group is the predicted score for that group. The best-fitting model for a group of scores uses the mean of a group to predict those scores, and each of the models (dummy coded, ANOVA, contrast-coded, etc.) is doing just that.
# I've added the predictions to the data
n1 <- n1 %>%
mutate(model.b.predictions = predict(model1),
model.c.predictions = predict(aov(time ~ attract*approach, n1)),
model.d.predictions = predict(conModel))
n2 <- read.csv("https://whlevine.hosted.uark.edu/psyc5143/exercise2.csv", stringsAsFactors = TRUE)
# a
model2a <- lm(yendu ~ xage + zexer, n2)
coef(model2a)
# b
n2 <- n2 %>%
mutate(xage.c = xage - mean(xage),
zexer.c = zexer - mean(zexer))
model2b <- lm(yendu ~ xage.c + zexer.c, n2)
coef(model2b)
# c
model2c <- lm(yendu ~ xage.c * zexer.c, n2)
coef(model2c)
# d
n2 <- n2 %>%
mutate(ageLow = xage - (mean(xage) - sd(xage)),
exerLow = zexer - (mean(zexer) - sd(zexer)))
model2d <-lm(yendu ~ ageLow * exerLow, n2)
coef(model2d)
# e
n2 <- n2 %>%
mutate(ageHi = xage - (mean(xage) + sd(xage)),
exerHi = zexer - (mean(zexer) + sd(zexer)))
model2e <-lm(yendu ~ ageHi * exerHi, n2)
coef(model2e)
# f (cell means; the extra stuff below is to create a table of means to "print")
aggregate(yendu ~ ageGroup * exerGroup, n2, mean) %>%
pivot_wider(id_cols = ageGroup,
names_from = exerGroup,
values_from = yendu) -> table2f
# what is pivot_wider? it takes data that is many rows and tranposes some of its
# information to columns
# g (marginal means for age groups)
aggregate(yendu ~ ageGroup, n2, mean) -> table2g
# h (marginal means for the exercise groups)
aggregate(yendu ~ exerGroup, n2, mean) -> table2h
# i
n2 <- n2 %>%
mutate(
ageC = ifelse(ageGroup == "high", 1/2, -1/2),
exerC = ifelse(exerGroup == "high", 1/2, -1/2)
)
model2i <- lm(yendu ~ ageC + exerC, n2)
coef(model2i)
# j
model2j <- lm(yendu ~ ageC * exerC, n2)
coef(model2j)
# k
n2 <- n2 %>%
mutate(
ageD = ifelse(ageGroup == "high", 1, 0),
exerD = ifelse(exerGroup == "high", 1, 0)
)
model2k <- lm(yendu ~ ageD * exerD, n2)
coef(model2k)
The slopes of age and exercise are -0.18 and 0.84, respectively.
The slopes are the same. Why? There’s no interaction, so the effect of each is with the other held constant at any value.
The slopes are: \(b_{age_c}\) = -0.18, \(b_{exercise_c}\) = 0.86, \(b_{interaction}\) = 0.0086. The slope of age is the predicted change in endurance for each one-unit increase in age, holding exercise constant at its mean. The slope of exercise is the predicted change in endurance for a one-unit increase in exercise, holding age constant at its mean. The slope of the interaction is the increase in either one of these simple slopes as the other variable increases by one.
The slopes are: \(b_{age_c}\) = -0.21, \(b_{exercise_c}\) = 0.77, \(b_{interaction}\) = 0.0086. The slope of age is the predicted change in endurance for each one-unit increase in age, holding exercise constant at 1 SD below its mean. The slope of exercise is the predicted change in endurance for a one-unit increase in exercise, holding age constant at 1 SD below its mean. The slope of the interaction is the increase in either one of these simple slopes as the other variable increases by one.
The slopes are: \(b_{age_c}\) = -0.14, \(b_{exercise_c}\) = 0.94, \(b_{interaction}\) = 0.0086. The slope of age is the predicted change in endurance for each one-unit increase in age, holding exercise constant at 1 SD above its mean. The slope of exercise is the predicted change in endurance for a one-unit increase in exercise, holding age constant at 1 SD above its mean. The slope of the interaction is the increase in either one of these simple slopes as the other variable increases by one.
age group |
hi exercise |
lo exercise |
---|---|---|
high |
28.3 |
22.4 |
low |
30.4 |
25.0 |
age group |
M |
---|---|
high |
25.4 |
low |
27.7 |
exercise group |
M |
---|---|
high |
29.4 |
low |
23.7 |
The slopes of the age and exercise contrast codes are -2.31 and 5.69, respectively. These are differences in marginal means (displayed in parts g and h, respectively).
(I’ve copied and pasted my answer from part c, with minor modifications. I can do this because it’s essentially the same model. The only things that have changed are what an increase of “one unit” means and what the “mean” of a variable indicates. That’s it.) The slopes are: \(b_{age}\) = -2.31, \(b_{exercise}\) = 5.69, \(b_{interaction}\) = 0.393. The slope of age is the predicted change in endurance for each one-unit increase in age (which means switching from young to old), holding exercise constant at its mean (which is for anyone). The slope of exercise is the predicted change in endurance for a one-unit increase in exercise (which enan switching from low to high exercise), holding age constant at its mean (which is for anyone). The slope of the interaction is the increase in either one of these simple slopes as the other variable increases by one (i.e., switches from the low to the high group).
Alternatively, the simple slopes can be interpreted as being about marginal means. The age slope is the difference between the means of the old (M = 25.4) and young (M = 27.7). The exercise slope is the difference between the means of the low exercisers (M = 23.7) and high exercisers (M = 29.4). The interaction slope is the difference of differences in cell means ((28.33 - 22.44) - (30.45 - 24.95)) = 0.39.
The slopes are: \(b_{age}\) = -2.51, \(b_{exercise}\) = 5.49, \(b_{interaction}\) = 0.393. These are now about cell means. The age slope is about the difference between young and old when exercise = 0 (i.e., for the low exercisers). The exercise slope is about the difference between high and low exercisers when age = 0 (i.e., for young participants). The interaction slope is about the same thing it was in part j.