Appendix
GEOG246-346
1 Module 1 practice answers
N/A
2 Module 2 practice answers
N/A
3 Module 3 practice answers
3.1 Practice 1
3.1.1 Questions
- Class of
ais integer. - Because the vector is contained in a list, and using a single
[]to pull out an element returns a 1-element list, so you have to use list notation to get the vector. ais vector of integers,lis a list of containing 2 integer vectors and 1 character vector.- We applied a function that is the fourth element of list
l2to the integer vectorfthat is the second element ofl2.
3.2 Practice 2
3.2.1 Questions
- The vector is coerced to character.
- The whole
matrixis coerced to character. Ifdata.frame, onlym$bis coerced to character. - Have to use both column and row indices to attract row, col subset;
b) Can use list notation for
data.frame(e.g.m$b), but not formatrix.
- Have to use both column and row indices to attract row, col subset;
b) Can use list notation for
3.3 Practice 3
3.3.1 Questions
- 1 and 5 pulled from
band multiplied in sequence onmby row then column. - Would multiply 1st columns of each, 2nd columns of each, and 3rd columsn of each.
drow 1 columns 1 to 3 multiplied by the value from the last row, first column ofm.
3.4 Practice 4
3.4.1 Code
- It adds an NA to print statement for the 5th letter of the alphabet.
Expand
sscriptby adding another “th” to the vector, so it has 5 elements.
sscript <- c("st", "nd", "rd", "th", "th") # vector of superscripts
for(i in 1:5) { # for loop with iterator i over vector 1:4
stmnt <- paste0(letters[i], " is the ", i, sscript[i],
" letter in the alphabet")
print(stmnt) # print statement
}- Turns 4 of the letters red
- Adds axes to plots
sscript <- c("st", "nd", "rd", "th") # vector of superscripts
par(mfrow = c(1, 4), mar = c(0, 0, 1, 0.5))
for(i in 1:4) {
stmnt <- paste0(letters[i], " is the ", i, sscript[i],
" letter in the alphabet")
plot(1:4, rep(3, 4), ylim = c(1, 5), pch = letters[1:4], #axes = FALSE,
xlab = "", ylab = "", main = stmnt, cex = 2)
# points(i, 3, pch = letters[i], col = "red", cex = 2)
}# Chunk 26
for(i in 1:10) {
if(i < 5) { # condition 1
print(paste(i, "is less than", i + 1))
} else if(i >= 3 & i <= 8) { # condition 2
print(paste(i, "is between", i - 1, "and", i + 1))
} else { # remaining conditions
print(paste(i, "is greater than", i - 1))
}
}- Create a
forloop that iterates over a vector 1:20. Insert a condition into it such that it only prints out a result when the iterator’s value is 11
3.5 Practice 5
3.5.1 Code
- Turns the summary of “a” variables to NA.
dat_list <- list(data.frame(a = 1:10, b = 21:30),
data.frame(a = 31:40, b = 41:50),
data.frame(a = 51:60, b = 61:70))
dat_modify <- function(x) {
x[1:3, 1] <- 999
return(x)
}
dat_list <- lapply(dat_list, dat_modify)
dat_list
dat_list2 <- c(dat_list, mean) # add another element to dat_list
lapply(1:length(dat_list2), function(x) { # x <- 1
d <- dat_list2[[x]] # extract element of list
if(is.data.frame(d)) { # check if it is a data.frame
d[d == 999] <- NA # convert any 999 values to NA
o <- c(colSums(d, na.rm = FALSE), # column sums, dropping NAs
"total" = sum(d, na.rm = TRUE)) # sum dropping NAs
} else { # if it is not a data.frame, make an error statement
o <- paste("Operation not valid for a", class(d))
}
return(o) # return result
})- Can’t coerce list to double.
4 Module 4 practice answers
4.1 Practice 1
4.1.1 Questions
A
tibbleis an enhanceddata.frame. Among other features, it provides more information on data types it contains when printing. It can be treated exactly like adata.framethough, in terms of indexing and other operations.Base methods:
tb_a$a;tb_a[["a"]];tb_a[, "a"]. tidyverse, withtb_a %>% dplyr::select(a). Note that that gives back atibble(ordata.frame, if you have adata.frame). To get a vector back, usetb_a %>% dplyr::pull(a)The data are messy, because the “column headers are values, not variable names” (see here). We would gather the data in the month rows, setting the month as the key and the mean flu cases as the value.
inner_joinpreserves just the values in x and drops non-matching rows from y.left_joinfills non-matching values in y with NAs.right_joinpreserves values in y dropping non-matching values in x.full_joinpreserves all non-matching rows in both x and y, filling NAs into non-matched rows.
4.1.2 Code
set.seed(1)
dat <- tibble(a = sample(1:10), b = rnorm(10))
td <- "/path/where/you/want/to/write" # REPLACE THIS WITH YOUR OWN!!!!
readr::write_csv(dat, path = file.path(td, "dummy.csv"))td <- "/path/where/you/want/to/write" # REPLACE THIS WITH YOUR OWN!!!!
readr::read_csv(file.path(td, "dummy.csv"))- Recreate
tibbles first
# Chunk 13
set.seed(1)
t1 <- tibble(v1 = paste0("N", 1:5), v2 = rnorm(5))
t2 <- tibble(v1 = paste0("N", 1:5), v3 = runif(5))
t3 <- tibble(v1 = paste0("N", 1:7), v4 = sample(1:100, 7))
# v5 = letters[sample(1:26, 7)])
t4 <- tibble(v1 = paste0("N", c(1:2, 4:7, 11)),
v5 = letters[sample(1:26, 7)])Then do joins:
left_join(t1, t2) %>% left_join(., t3) %>% left_join(., t4)
right_join(t1, t2) %>% right_join(., t3) %>% right_join(., t4)left_join(t1, t2) %>% left_join(., t3) %>% left_join(., t4) %>% arrange(v5)
right_join(t1, t2) %>% right_join(., t3) %>% right_join(., t4) %>%
arrange(desc(v5))fs <- dir(system.file("extdata/", package = "geospaar"), pattern = "FAOSTAT",
full.names = TRUE)
crops <- lapply(fs, readr::read_csv)
crops_df <- do.call(rbind, lapply(crops, function(x) {
x %>% dplyr::select(Item, Area, Element, Year, Value) %>%
pivot_wider(names_from = Element, values_from = Value) %>%
rename(crop = Item, country = Area, year = Year,
harv_area = `Area harvested`, prod = Production)
}))
crop_ylds <- crops_df %>% mutate(yield = prod / harv_area)
crop_ylds <- crop_ylds %>%
mutate(country = ifelse(country == "South Africa", "ZAF", country)) %>%
mutate(country = ifelse(country == "Zambia", "ZMB", country)) %>%
mutate(harv_km2 = harv_area / 100)my_tb <- tibble(v1 = 1:10, v2 = 11:20) %>%
rbind(., tibble(v1 = 11:20, v2 = 21:30)) %>% mutate(v3 = v2^2) %>%
arrange(-v3)4.2 Practice 2: Analysis
4.2.1 Questions
dplyr::filtergroup_by(crop, country, y2k)is doing the splitting, on crop type, then country, and then year.summarize(...)is doing the apply using amean. There is no combine line, as it is implicit.- Chunk 30 adds a
filterfor crop type (selecting out maize), and then simply groups on the y2k variable. - It doesn’t work when the output of the analysis is not tabular/a
list, as with
cor.testandlm. We can overcome this by 1) creating individual functions that reproduce the component outputs of the analysis (e.g. Chunk 33) and these as a list of functions usingfunstosummarise_all, 2) doing the splits outside of the pipeline (e.g. Chunk 36), or 3) using functions such asdoandbroom::tidywithin the pipeline (e.g. Chunk 39).
4.2.2 Code
crop_ylds %>% filter(crop == "sorghum" & country == "ZAF" & year >= 2000)
crop_ylds[crop_ylds$crop == "sorghum" & crop_ylds$country == "ZAF" &
crop_ylds$year >= 2000, ]crop_ylds %>% filter(crop == "sorghum" & country == "ZAF" & year >= 2000) %>%
select(prod, harv_area, yield) %>% summarise_all(funs(mean, sd))crop_ylds %>% filter(crop == "maize" & country == "ZMB") %>%
select(yield, harv_area) %>% cor()
dat <- crop_ylds %>% filter(crop == "maize" & country == "ZMB")
cor.test(dat$harv_area, dat$yield)- South Africa shows larger yields gains (0.061 t/ha/yr versus 0.03 t/ha/yr)
summary(lm(yield ~ year,
data = crop_ylds %>% filter(crop == "maize" & country == "ZMB")))
summary(lm(yield ~ year,
data = crop_ylds %>% filter(crop == "maize" & country == "ZAF")))
crop_ylds %>% filter(crop == "maize" & country == "ZMB") %>%
lm(yield ~ year, data = .) %>% summary()
crop_ylds %>% filter(crop == "maize" & country == "ZAF") %>%
lm(yield ~ year, data = .) %>% summary()4.3 Practice 3: Visualization
4.3.1 Questions
ggplot2is built fromgridgraphics, and is based on an underlying visualization philosophy. It builds up graphics objects using the+operator, and easily does splits within the data using the “color” argument withinaesand/or thefacet_gridfunction.graphicsplots can be faster to implement for exploratory analysis,ggplot2has more attractive, presentation-grade defaults.- Because the syntax used in
graphicsplots is used in many of the plotting functions developed for spatial packages, including newer ones such assfandstars. - Each of the three plots takes the axis labels exactly as they are specified to the axis arguments (either as they were specified in the formula in the case of Chunk 40 or to the “x” and “y” arguments in Chunk 41). You can change the names using the “xlab” and “ylab” arguments.
- Using “col”, “pch”, and “cex” arguments.
- You have to add the
.to the “data” argument ofplot, e.g.dat %>% plot(y ~ x, data = .) - You get just an empty grey background–
ggplotwon’t plot anything without ageom_*function added to theggplotobject.
4.3.2 Code
crop_ylds %>% filter(crop == "sorghum") %>%
ggplot() + geom_histogram(aes(x = yield), bins = 15) +
ggtitle("Distribution of sorghum yields")
crop_ylds %>% filter(crop == "sorghum") %>%
ggplot() + geom_histogram(aes(x = yield), bins = 15, fill = "red") +
ggtitle("Distribution of sorghum yields")crop_ylds %>% filter(crop == "wheat" & country == "ZAF") %>%
plot(harv_area ~ year, data = ., pch = 16, col = "blue",
xlab = "", ylab = "Harvested area (ha)",
main = "South Africa wheat (1961-2017)")
crop_ylds %>% filter(crop == "wheat" & country == "ZAF") %>%
plot(harv_area ~ year, data = ., pch = 16, type = "l", col = "blue",
xlab = "", ylab = "Harvested area (ha)",
main = "South Africa wheat (1961-2017)")crop_ylds %>% filter(crop == "wheat" & country == "ZAF") %>%
ggplot() + geom_point(aes(year, harv_area), col = "blue") +
xlab("") + ylab("Harvested area (ha)") +
ggtitle("South Africa wheat (1961-2017)")
crop_ylds %>% filter(crop == "wheat" & country == "ZAF") %>%
ggplot() + geom_line(aes(year, harv_area), col = "blue") +
xlab("") + ylab("Harvested area (ha)") +
ggtitle("South Africa wheat (1961-2017)")crop_ylds %>% filter(crop == "wheat") %>%
ggplot() + geom_line(aes(year, harv_area, color = country)) +
scale_color_manual(values = c("red", "blue")) +
xlab("") + ylab("Harvested area (ha)") +
ggtitle("Wheat (1961-2017)")
# extra
crop_ylds %>% filter(crop == "wheat") %>%
ggplot() + geom_line(aes(year, log10(harv_area), color = country)) +
scale_color_manual(values = c("red", "blue")) +
xlab("") + ylab("Harvested area (ha)") +
ggtitle("Wheat (1961-2017)")crop_ylds %>% filter(crop == "wheat" & country == "ZAF") %>%
ggplot() + geom_point(aes(year, harv_area)) +
geom_smooth(aes(year, harv_area)) +
xlab("") + ylab("Harvested area (ha)") +
ggtitle("South African wheat (1961-2017)")# ggplot2
crop_ylds %>% filter(crop == "wheat" & country == "ZMB") %>%
ggplot() +
geom_histogram(aes(x = harv_area), bins = 10, col = "black", fill = "blue") +
xlab("Harvested area (ha)") + ggtitle("Zambian Wheat (1961-2017)")
# hist
# with dplyr
crop_ylds %>% filter(crop == "wheat" & country == "ZMB") %>% pull(harv_area) %>%
hist(., main = "Zambian Wheat (1961-2017)", xlab = "Harvested area (ha)",
col = "blue")
# with base subsetting
hist(crop_ylds$harv_area[crop_ylds$crop == "wheat" &
crop_ylds$country == "ZMB"],
main = "Zambian Wheat (1961-2017)", xlab = "Harvested area (ha)",
col = "blue")crop_ylds %>% filter(country == "ZAF") %>%
ggplot() + geom_point(aes(x = year, y = harv_area)) +
geom_smooth(aes(x = year, y = harv_area)) +
facet_grid(cols = vars(crop)) +
scale_color_manual(values = c("red", "blue")) +
ylab("Yield (tons/ha)") + xlab("")