Star Wars data

1 Explore the Star Wars data in tidyverse

library(tidyverse)
head(starwars)
# A tibble: 6 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
2 C-3PO        167    75 <NA>       gold       yellow         112   none  mascu…
3 R2-D2         96    32 <NA>       white, bl… red             33   none  mascu…
4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

2 Long vs. wide format

tatooine <- starwars %>% 
  filter(homeworld == "Tatooine") # select only those from Tatooine

tibble: is a dataframe specific version for tidyverse. It is both an object and a tibble. So it talks well with all of the related packages. Columns in a tibble are character vectors (instead of factors). This is technically easier to deal with.

2.0.1 Activity

How many species on Tatooine?

unique(starwars$species)
length(unique(starwars$species))

Use your tidyverse cheatsheets and google!

2.0.2 Activity

Create a table of droids that are equal to or greater than 96 inches in height.

3 Summarizing data

What are the average masses of humans versus ewoks?

Use group_by() and summarize().

starwars %>% 
  filter(species == "Human" | species == "Ewok") %>% 
  group_by(species) %>% 
  summarise(MEAN_mass = mean(mass))
# Modify the above to remove NAs and get a values for Humans.

What about the average mass of humans and Ewoks on different planets?

starwars %>% 
  filter(species == "Human" | species == "Ewok") %>% 
  group_by(species, homeworld) %>% 
  summarise(MEAN_mass = mean(mass))

summarize() creates a smaller output table, or a summary. This is dictated by group_by(). Mutate() adds a column to the data.

3.0.1 Activity

Add a column to star wars that classifies tall versus short, based on the height of each species.

Add a column that defines if the species is Human versus non-human.

What are the mean masses of humans versus non-humans? And how many examples are included in this table.

What are the max and min heights of humans and non-humans by planet?

4 Star wars data wrangling

Isolate non-droids on Alderaan, Naboo, Endor, Kamino, and Coruscant. Based on their reported sex, is there a relationship between their heigh, planet, and species?

head(starwars)
# A tibble: 6 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
2 C-3PO        167    75 <NA>       gold       yellow         112   none  mascu…
3 R2-D2         96    32 <NA>       white, bl… red             33   none  mascu…
4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
unique(starwars$homeworld)
 [1] "Tatooine"       "Naboo"          "Alderaan"       "Stewjon"       
 [5] "Eriadu"         "Kashyyyk"       "Corellia"       "Rodia"         
 [9] "Nal Hutta"      "Bestine IV"     NA               "Kamino"        
[13] "Trandosha"      "Socorro"        "Bespin"         "Mon Cala"      
[17] "Chandrila"      "Endor"          "Sullust"        "Cato Neimoidia"
[21] "Coruscant"      "Toydaria"       "Malastare"      "Dathomir"      
[25] "Ryloth"         "Aleen Minor"    "Vulpter"        "Troiken"       
[29] "Tund"           "Haruun Kal"     "Cerea"          "Glee Anselm"   
[33] "Iridonia"       "Iktotch"        "Quermia"        "Dorin"         
[37] "Champala"       "Geonosis"       "Mirial"         "Serenno"       
[41] "Concord Dawn"   "Zolan"          "Ojom"           "Skako"         
[45] "Muunilinst"     "Shili"          "Kalee"          "Umbara"        
[49] "Utapau"        
homes <- c("Alderaan", "Naboo", "Endor", "Kamino", "Coruscant")

starwars %>% 
  filter(homeworld %in% homes) %>% 
  filter(species != "Droid") %>% 
  ggplot(aes(x = species, y = height, fill = sex)) +
    # geom_bar(stat = "identity", aes(fill = sex)) +
    geom_bar(stat = "identity", position = "dodge", aes(fill = sex)) +
    facet_grid(homeworld ~ .)