Day 3B Practice
Question 1.
Download the political dataset from the workshop website, copy it to your project folder, and read it into R. Mutate this tibble such that the
sex
variable is a factor (0=Male, 1=Female).Create a histogram to view the distributions of the
informed
andtv_news
variables and a bar plot to view the counts of thesex
variable. What do these distributions look like?Build a regression model to predict each student’s self-reported level of political informedness (
informed
) from the degree to which they watch television news (tv_news
). What is the standardized slope for the television news predictor? Bonus: What is the model’s adjusted R-squared?Expand your regression model from Question 1(b) to become a multiple regression model that also includes students’ sex (
sex
) as a predictor. What are the two predictor variables’ partial effects?Statistically test whether the effect of television watching depends on students’ sex and generate a plot that shows the model’s predictions (to help you interpret any interaction effect).
Click here for the answer key
Answer (a)
library(tidyverse) <- political read_csv("political.csv") |> mutate(sex = factor(sex, levels = c(0, 1), labels = c("Male", "Female"))) |> print()
#> # A tibble: 59 × 9 #> student year sex voting read_news read_…¹ tv_news ethic…² infor…³ #> <dbl> <dbl> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 1 Female Not Registered 3 1 0 2 2 #> 2 2 4 Female Voted 1 1 3 2 2 #> 3 3 2 Male Voted 4 1 3 2 3 #> 4 4 3 Female Voted 3 1 0 3 2 #> 5 5 1 Female Voted 2 0 0 2 3 #> 6 6 2 Female Didn't Vote 3 1 3 2 4 #> 7 7 1 Male Didn't Vote 4 1 0 4 4 #> 8 8 3 Female Voted 2 1 1 2 3 #> 9 9 4 Male Voted 3 1 0 4 3 #> 10 10 4 Female Voted 3 0 4 3 3 #> # … with 49 more rows, and abbreviated variable names ¹read_edit, #> # ²ethical_practical, ³informed #> # ℹ Use `print(n = ...)` to see more rows
Answer (b)
ggplot(political, aes(x = informed)) + geom_histogram()
ggplot(political, aes(x = tv_news)) + geom_histogram()
ggplot(political, aes(x = sex)) + geom_bar()
The
informed
variable is discrete at 1, 2, 3, 4, or 5 with a triangular pattern where 3 is most common. Thetv_news
variable is discreet at 0, 1, 2, 3, or 4 with a pattern where 0 and 3 are most common. Thesex
variable is relatively balanced between male and female.Answer (c)
library(parameters) library(performance) <- lm(informed ~ tv_news, data = political) fit model_parameters(fit, standardized = "refit")
model_performance(fit)
The standardized slope is 0.21 for
tv_news
and it is significantly different from zero \((p=.004)\). This simple model explains 12% of the variance ininformed
\((R^2_{adj}=0.123)\).Answer (d)
<- lm(informed ~ tv_news + sex, data = political) fit2 model_parameters(fit2)
The partial effect of television news was \(0.20\) and the partial effect of sex was \(-0.44\) (females reported feeling/being less informed than males).
Answer (e)
library(ggeffects) <- lm(informed ~ tv_news * sex, data = political) fit3 model_parameters(fit3)
ggeffect(fit3, terms = c("tv_news", "sex")) |> plot()
The effect of television news does not seem to depend on sex, as the interaction effect was not significant \((p=.887)\) and the prediction lines were very close to parallel.
Question 2
Did you bring your own data? If so, load it into R and explore it using the tools we learned this week. Wrangle it a bit and maybe do some basic statistics on it, estimating a few regression models. If not, then look through the datasets on the workshop website and find one that is interesting to you; do the same process I described above to practice your skills in a more self-directed manner.