Day 2A Practice

Question 1.

Load the economics tibble (included in {tidyverse}). Then create a pipeline to accomplish the following data wrangling steps:

First, in the economics tibble, rename the uempmed variable (the median duration of enemployment, in weeks) to duration and the unemploy variable (the number of unemployed, in thousands) to number.
Then add a new variable called rate that is calculated by dividing number (from part a) by pop (the total population, in thousands).
Then drop (i.e., unselect) the pce, pop, and psavert variables from the tibble.
Then relocate the rate variable to be between the date and duration variables.

Click here for the answer key

Answer (a)
economics |> 
  rename(duration = uempmed, number = unemploy)
Answer (b)
economics |> 
  rename(duration = uempmed, number = unemploy) |> 
  mutate(rate = number / pop)
Answer (c)
economics |> 
  rename(duration = uempmed, number = unemploy) |> 
  mutate(rate = number / pop) |> 
  select(-c(pce, pop, psavert))
Answer (d)
economics |>
  rename(duration = uempmed, number = unemploy) |> 
  mutate(rate = number / pop) |> 
  select(-c(pce, pop, psavert)) |> 
  relocate(rate, .after = date)

Question 2.

Download the cereal data file from the workshop website. Copy it to your Project folder and read it into R as a tibble. Then create a pipeline to produce a short list of cereals for me to try. I am only interested in cold cereals with a rating greater than 70. Please arrange the list so that the cereals with the fewest calories are displayed at the top. Finally, write the result of this pipeline to a CSV file called “jeffs_list.csv” (I expect it to be even more popular than Craig’s list).

Click here for the answer key

cereal <- read_csv("cereal.csv") 

jeffs_list <- 
  cereal |> 
  filter(type == "cold" & rating > 70) |> 
  arrange(calories)

jeffs_list
write_csv(jeffs_list, "jeffs_list.csv")

Question 3

Transform the drv variable in the mpg tibble into a factor where “4” is labeled Four Wheel Drive, “r” is labelled Rear Wheel Drive, and “f” is labeled Front Wheel Drive. Save this updated tibble as mpg2.
Transform the manufacturer and model variables in the mpg2 tibble so that the first letter of each word is capitalized. Save this updated tibble as mpg3.

Click here for the answer key

Answer (a)
mpg2 <- 
  mpg |> 
  mutate(
    drv = factor(
      drv, 
      levels = c("4", "r", "f"), 
      labels = c("Four Wheel Drive", "Rear Wheel Drive", "Front Wheel Drive")
    )
  )
mpg2
Answer (b)
mpg3 <- 
  mpg2 |> 
  mutate(
    manufacturer = str_to_title(manufacturer),
    model = str_to_title(model)
  )
mpg3

Resources

R4DS Chapter 5 (§5.1–§5.5): Read more about the basic wrangling verbs
R4DS Chapter 18: Read more about pipes and pipelines¹
R4DS Chapter 15: Read more about working with factors in R
Schafer & Graham (2002): Read more about missing data analysis²

Fun Stuff

Stunt Rope

These are some advanced wrangling verbs…

Pipeline

The official theme song of pipelines everywhere…

Footnotes

Note that this chapter talks about %>% being the pipe operator, which might confuse you at first since I am teaching |> as the pipe. Basically, |> is newer and requires no packages to be installed; it often called the “native pipe” as in it is built into R natively.↩︎
Note that ignoring missing values can lead to estimation bias and reduced power when performing statistical inference. Read this paper to learn more about better approaches, which can be done in R (e.g., using the {mice} package for multiple imputation or the {lavaan} package for full information maximum likelihood).↩︎