Geospatial Analysis with R

Class 9

library(tidyverse)
library(sf)
library(geospaar)
districts <- read_sf(
  system.file("extdata", "districts.geojson", package = "geospaar")
)
farmers <- read_csv(
  system.file("extdata", "farmer_spatial.csv", package = "geospaar")
) %>% group_by(uuid) %>% 
  summarize(x = mean(x), y = mean(y), n = n()) %>%
  filter(y > -18) #%>% st_as_sf(coords = c("x", "y"), crs = 4326)
p <- ggplot() + 
  geom_sf(data = districts, lwd = 0.1) + 
  geom_point(data = farmers, 
             aes(x = x, y = y, size = n * 0.8, color = n), alpha = 0.9) +
  scale_color_viridis_c(guide = FALSE) + theme_void() + 
  theme(legend.position = c(0.85, 0.2)) +
  scale_size(range = c(0.1, 5), name = "N reports/week")
ggsave(here::here("docs/figures/zambia_farmer_repsperweek.png"), 
       width = 6, height = 4, dpi = 300, bg = "transparent")

Today

  • Control structures with emphasis on *apply

Control structures

Branching

  • Pay attention to { } placement
a <- 5
if(a > 10) {
  print("Greater than 10!")
} else {
  print("Less than or equal to 10")
}
[1] "Less than or equal to 10"

Looping

b <- 1:3
for(i in b) print(i)
[1] 1
[1] 2
[1] 3
b <- 1:5
a <- 2
for(i in b){
  a <- 2 * a
  print(a)
}
[1] 4
[1] 8
[1] 16
[1] 32
[1] 64

*apply

  • A special form of looping
  • Intended for applying a function to data. Uses anonymous function.
  • 3 main kinds: sapply, lapply, apply

sapply

sapply iterates over input and returns a vector.

v <- 1:10
sapply(v, function(x) x + 10) ## adds 10 to each element in v.
 [1] 11 12 13 14 15 16 17 18 19 20

Use { } for more complicated functions. BUT be careful with order of { }, ( )

v1 <- 1:10
v2 <- sapply(v1, function(x){
  y <- x^2 
  return(y)
}) #
print(v2)
 [1]   1   4   9  16  25  36  49  64  81 100

sapply

If you don’t specify return, the last object created will be returned.

v1 <- 1:10
v2 <- sapply(v1, function(x){
  y <- x^2  ## y will be returned
}) #
print(v2)
 [1]   1   4   9  16  25  36  49  64  81 100

lapply

  • Similar to sapply, except final object is returned as list.
  • Useful if you need to store more complex objects (data.frame, plot, raster etc.)
v1 <- 1:10
v2 <- lapply(v1, function(x){
  y <- x^2  ## y will be returned
}) #
print(v2)
[[1]]
[1] 1

[[2]]
[1] 4

[[3]]
[1] 9

[[4]]
[1] 16

[[5]]
[1] 25

[[6]]
[1] 36

[[7]]
[1] 49

[[8]]
[1] 64

[[9]]
[1] 81

[[10]]
[1] 100

apply

apply works well for 2D data, when you want to apply function over a row or column.

v1 <- sample(1:100, 10)
v2 <- sample(1:100, 10)
DF <- data.frame(v1, v2) ## data frame columns will take names of vectors
DF
   v1 v2
1  20 14
2  58 79
3  78 40
4   6 46
5  90 17
6   3 62
7  35 21
8  98 30
9  75 20
10 43 37

Use apply to get column max value. The index 2 means “apply function to columns”.

colMax <- apply(DF, 2, FUN = max)
colMax
v1 v2 
98 79 

Use apply to get row max value. The index 1 means “apply function to rows”.

rowMax <- apply(DF, 1, FUN = max)
rowMax
 [1] 20 79 78 46 90 62 35 98 75 43

We can use apply or sapply to create a new column in a data frame.

DF$rowMax <- apply(DF, 1, FUN = max)
DF
   v1 v2 rowMax
1  20 14     20
2  58 79     79
3  78 40     78
4   6 46     46
5  90 17     90
6   3 62     62
7  35 21     35
8  98 30     98
9  75 20     75
10 43 37     43

Data generation

Create the following:

  • dat, a data.frame built from V1, V2, V3, and V4, where:
    • V1 = 1:20
    • V2 is a random sample between 1:100
    • V3 is drawn from a random uniform distribution between 0 and 50
    • V4 is a random selection of the letters A-E
    • Use set.seed(50)
  • Do this all at once (i.e. wrap the creation of V1-V4 in the data.frame call, precede it with set.seed())

Exercises

  • Use a for to iterate over each row of dat and calculate it’s sum
  • Do the same with lapply and sapply
  • Do the same using rowSums
  • Select rows from dat containing the letter “E” in V4, and take the mean of values from the result in column V3