class: center, middle, inverse, title-slide .title[ # Geospatial Analysis with R ] .subtitle[ ## Class 9 ] --- ```r library(sf) library(dplyr) library(ggplot2) library(rnaturalearth) library(rnaturalearthdata) data(world.cities, package = "maps") world <- ne_countries(scale = "medium", returnclass = "sf") afr_capitals <- world.cities %>% filter(capital == 1) %>% st_as_sf(coords = c("long", "lat"), crs = 4326) %>% st_intersection(., world %>% filter(continent == "Africa")) p <- world %>% filter(continent == "Africa") %>% ggplot() + geom_sf(aes(fill = name), lwd = 0.2) + geom_sf(data = afr_capitals, col = "blue", size = 0.5) + scale_fill_grey(guide = FALSE) + theme_minimal() ggsave(here::here("external/slides/figures/africa_capitals.png"), width = 5, height = 4, dpi = 300, bg = "transparent") ``` --- ## Functions ### Components ```r function_name <- function(arg1, arg2 = 1:10, arg3 = ifelse(arg2 == 2, TRUE, FALSE)) { body } ``` Three components of a function: - `formals()`: arguments - `body()`, the code, which returns the last object generated, unless specified with `return(x)`. - `environment()`, function finds the values Unnamed functions are **anonymous** functions. (Used in `*apply`) --- Using `x` in a function does not change its global value. ```r x <- 1:10 myfun <- function() { x * 10 } myfun() ``` ``` ## [1] 10 20 30 40 50 60 70 80 90 100 ``` ```r myfun <- function(x) { x <- x * 10 return(x) } x <- 10 myfun(x = 20) ``` ``` ## [1] 200 ``` ```r x ``` ``` ## [1] 10 ``` --- Each time you run `myfun`, a new function environment is created. ```r myfun <- function(x) { x <- x * 10 print(environment()) return(x) } myfun(x) ``` ``` ## <environment: 0x7fefab0d0748> ``` ``` ## [1] 100 ``` ```r myfun(x) ``` ``` ## <environment: 0x7fefaac253f8> ``` ``` ## [1] 100 ``` --- ## Global assignment. Use `<<-` to change value of global variable within a function. ```r a <- 10 myfun <- function(x) { a <<- x * 10 ## note <<- instead of <- return(a) } myfun(5) ``` ``` ## [1] 50 ``` ```r print(a) ``` ``` ## [1] 50 ``` --- ```r library(dplyr) ``` ``` ## ## Attaching package: 'dplyr' ``` ``` ## The following objects are masked from 'package:terra': ## ## intersect, union ``` ``` ## The following objects are masked from 'package:stats': ## ## filter, lag ``` ``` ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union ``` ```r search() ``` ``` ## [1] ".GlobalEnv" "package:dplyr" "package:terra" ## [4] "tools:rstudio" "package:stats" "package:graphics" ## [7] "package:grDevices" "package:utils" "package:datasets" ## [10] "package:methods" "Autoloads" "package:base" ``` ```r set.seed(1) v <- 1:100 + runif(100, -10, 10) f <- filter(v, rep(1 / 5, 5)) ``` ``` ## Error in UseMethod("filter"): no applicable method for 'filter' applied to an object of class "c('double', 'numeric')" ``` --- ```r detach("package:dplyr", unload = TRUE) search() ``` ``` ## [1] ".GlobalEnv" "package:terra" "tools:rstudio" ## [4] "package:stats" "package:graphics" "package:grDevices" ## [7] "package:utils" "package:datasets" "package:methods" ## [10] "Autoloads" "package:base" ``` ```r f <- filter(v, rep(1 / 5, 5)) ``` ```r plot(1:100, v, type = "l") lines(1:100, f, col = "red") ``` <img src="figures/ts-filter.png" width="45%" style="display: block; margin: auto;" /> --- ## Useful functions - `which` finds indices where a condition is true. ```r v <- 10:15 print(v) ``` ``` ## [1] 10 11 12 13 14 15 ``` ```r a <- which(v %% 3 == 0) ## subset to elements divisible by 3 print(a) ## shows indices where condition is true. ``` ``` ## [1] 3 6 ``` --- ## Useful functions - `which.min` finds index of min value ```r v <- sample(1:20, 10) print(v) ``` ``` ## [1] 12 16 1 13 5 15 6 18 9 7 ``` ```r print(which.min(v)) # index of min value ``` ``` ## [1] 3 ``` ```r print(which.max(v)) # index of max value ``` ``` ## [1] 8 ``` --- ## data.frame vs data.table vs. tibble - all 2D structures. - data.frame = Base R - tibble = `tidyverse` - data.table = fast. For now, we'll stick to data.frame --- ## data.frame indexing - `data.frame` uses the following to subset: `[*row conditions, *column conditions]` ```r df <- data.frame(v1 = 1:5, v2 = 6:10) rownames(df) <- LETTERS[1:5] print(df) ``` ``` ## v1 v2 ## A 1 6 ## B 2 7 ## C 3 8 ## D 4 9 ## E 5 10 ``` --- ## data.frame indexing - Index using names. - Empty index `[ , 'v2]` means "keep all rows" ```r df[,'v2'] ## column indexing ``` ``` ## [1] 6 7 8 9 10 ``` ```r df[c("A", "B", "D"), ] ## row indexing ``` ``` ## v1 v2 ## A 1 6 ## B 2 7 ## D 4 9 ``` --- ## data.frame subset - Logical subset ```r df[df$v1 > 3, ] ## get observations (rows) where first column is larger than 3 ``` ``` ## v1 v2 ## D 4 9 ## E 5 10 ``` --- ## Control structures ### Branching - Pay attention to `{ }` placement ```r a <- 5 if(a > 10) { print("Greater than 10!") } else { print("Less than or equal to 10") } ``` ``` ## [1] "Less than or equal to 10" ``` --- ### Looping ```r b <- 1:3 for(i in b) print(i) ``` ``` ## [1] 1 ## [1] 2 ## [1] 3 ``` ```r b <- 1:5 a <- 2 for(i in b){ a <- 2 * a print(a) } ``` ``` ## [1] 4 ## [1] 8 ## [1] 16 ## [1] 32 ## [1] 64 ``` --- ### *apply - A special form of looping - Intended for *applying* a function to data. Uses *anonymous* function. - 3 main kinds: `sapply`, `lapply`, `apply` --- ### `sapply` - `sapply` iterates over input and returns a vector. ```r v <- 1:10 sapply(v, function(x) x + 10) ## adds 10 to each element in v. ``` ``` ## [1] 11 12 13 14 15 16 17 18 19 20 ``` Use `{ }` for more complicated functions. BUT be careful with order of `{ }`, `( )` ```r v1 <- 1:10 v2 <- sapply(v1, function(x){ y <- x^2 return(y) }) # print(v2) ``` ``` ## [1] 1 4 9 16 25 36 49 64 81 100 ``` --- ### `sapply` If you don't specify `return`, the last object created will be returned. ```r v1 <- 1:10 v2 <- sapply(v1, function(x){ y <- x^2 ## y will be returned }) # print(v2) ``` ``` ## [1] 1 4 9 16 25 36 49 64 81 100 ``` --- ### `lapply` - Similar to `sapply`, except final object is returned as `list`. - Useful if you need to store more complex objects (data.frame, plot, raster etc.) ```r v1 <- 1:10 v2 <- lapply(v1, function(x){ y <- x^2 ## y will be returned }) # print(v2) ``` ``` ## [[1]] ## [1] 1 ## ## [[2]] ## [1] 4 ## ## [[3]] ## [1] 9 ## ## [[4]] ## [1] 16 ## ## [[5]] ## [1] 25 ## ## [[6]] ## [1] 36 ## ## [[7]] ## [1] 49 ## ## [[8]] ## [1] 64 ## ## [[9]] ## [1] 81 ## ## [[10]] ## [1] 100 ``` --- ### `apply` - `apply` works well for 2D data, when you want to apply function over a row or column. ```r v1 <- sample(1:100, 10) v2 <- sample(1:100, 10) DF <- data.frame(v1, v2) ## data frame columns will take names of vectors DF ``` ``` ## v1 v2 ## 1 87 1 ## 2 83 43 ## 3 90 59 ## 4 48 26 ## 5 64 15 ## 6 94 58 ## 7 60 29 ## 8 51 24 ## 9 34 42 ## 10 10 48 ``` --- Use `apply` to get column max value. The index 2 means "apply function to columns". ```r colMax <- apply(DF, 2, FUN = max) colMax ``` ``` ## v1 v2 ## 94 59 ``` --- Use `apply` to get row max value. The index 1 means "apply function to rows". ```r rowMax <- apply(DF, 1, FUN = max) rowMax ``` ``` ## [1] 87 83 90 48 64 94 60 51 42 48 ``` We can use `apply` or `sapply` to create a new column in a data frame. ```r DF$rowMax <- apply(DF, 1, FUN = max) DF ``` ``` ## v1 v2 rowMax ## 1 87 1 87 ## 2 83 43 83 ## 3 90 59 90 ## 4 48 26 48 ## 5 64 15 64 ## 6 94 58 94 ## 7 60 29 60 ## 8 51 24 51 ## 9 34 42 42 ## 10 10 48 48 ``` --- ## Data generation Create the following: - `dat`, a data.frame built from `V1`, `V2`, `V3`, and `V4`, where: - `V1` = 1:20 - `V2` is a random sample between 1:100 - `V3` is drawn from a random uniform distribution between 0 and 50 - `V4` is a random selection of the letters A-E - Use `set.seed(50)` - Do this all at once (i.e. wrap the creation of V1-V4 in the `data.frame` call, precede it with `set.seed()`) --- ## Exercises - Use a `for` to iterate over each row of `dat` and calculate it's `sum` - Do the same with `lapply` and `sapply` - Do the same using `rowSums` - Select rows from `dat` containing the letter "E" in `V4`, and take the mean of values from the result in column `V3`