Day 2A Practice
Question 1.
Load the economics
tibble (included in {tidyverse}). Then create a pipeline to accomplish the following data wrangling steps:
- First, in the
economics
tibble, rename theuempmed
variable (the median duration of enemployment, in weeks) toduration
and theunemploy
variable (the number of unemployed, in thousands) tonumber
. - Then add a new variable called
rate
that is calculated by dividingnumber
(from part a) bypop
(the total population, in thousands). - Then drop (i.e., unselect) the
pce
,pop
, andpsavert
variables from the tibble. - Then relocate the
rate
variable to be between thedate
andduration
variables.
Click here for the answer key
Answer (a)
|> economics rename(duration = uempmed, number = unemploy)
Answer (b)
|> economics rename(duration = uempmed, number = unemploy) |> mutate(rate = number / pop)
Answer (c)
|> economics rename(duration = uempmed, number = unemploy) |> mutate(rate = number / pop) |> select(-c(pce, pop, psavert))
Answer (d)
|> economics rename(duration = uempmed, number = unemploy) |> mutate(rate = number / pop) |> select(-c(pce, pop, psavert)) |> relocate(rate, .after = date)
Question 2.
Download the cereal data file from the workshop website. Copy it to your Project folder and read it into R as a tibble. Then create a pipeline to produce a short list of cereals for me to try. I am only interested in cold cereals with a rating greater than 70. Please arrange the list so that the cereals with the fewest calories are displayed at the top. Finally, write the result of this pipeline to a CSV file called “jeffs_list.csv” (I expect it to be even more popular than Craig’s list).
Click here for the answer key
<- read_csv("cereal.csv") cereal <- jeffs_list |> cereal filter(type == "cold" & rating > 70) |> arrange(calories) jeffs_list
write_csv(jeffs_list, "jeffs_list.csv")
Question 3
Transform the
drv
variable in thempg
tibble into a factor where “4” is labeled Four Wheel Drive, “r” is labelled Rear Wheel Drive, and “f” is labeled Front Wheel Drive. Save this updated tibble asmpg2
.Transform the
manufacturer
andmodel
variables in thempg2
tibble so that the first letter of each word is capitalized. Save this updated tibble asmpg3
.
Click here for the answer key
Answer (a)
<- mpg2 |> mpg mutate( drv = factor( drv, levels = c("4", "r", "f"), labels = c("Four Wheel Drive", "Rear Wheel Drive", "Front Wheel Drive") ) ) mpg2
Answer (b)
<- mpg3 |> mpg2 mutate( manufacturer = str_to_title(manufacturer), model = str_to_title(model) ) mpg3
Resources
R4DS Chapter 5 (§5.1–§5.5): Read more about the basic wrangling verbs
R4DS Chapter 18: Read more about pipes and pipelines1
R4DS Chapter 15: Read more about working with factors in R
Schafer & Graham (2002): Read more about missing data analysis2
Fun Stuff
Stunt Rope
These are some advanced wrangling verbs…
Pipeline
The official theme song of pipelines everywhere…
Footnotes
Note that this chapter talks about
%>%
being the pipe operator, which might confuse you at first since I am teaching|>
as the pipe. Basically,|>
is newer and requires no packages to be installed; it often called the “native pipe” as in it is built into R natively.↩︎Note that ignoring missing values can lead to estimation bias and reduced power when performing statistical inference. Read this paper to learn more about better approaches, which can be done in R (e.g., using the {mice} package for multiple imputation or the {lavaan} package for full information maximum likelihood).↩︎