Geospatial Analysis with R

class: center, middle, inverse, title-slide

.title[
# Geospatial Analysis with R
]
.subtitle[
## Class 9
]

---

```r
library(sf)
library(dplyr)
library(ggplot2)
library(rnaturalearth)
library(rnaturalearthdata)
data(world.cities, package = "maps")

world <- ne_countries(scale = "medium", returnclass = "sf")
afr_capitals <- world.cities %>% filter(capital == 1) %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326) %>% 
  st_intersection(., world %>% filter(continent == "Africa"))
p <- world %>% filter(continent == "Africa") %>% 
  ggplot() + geom_sf(aes(fill = name), lwd = 0.2) + 
  geom_sf(data = afr_capitals, col = "blue", size = 0.5) + 
  scale_fill_grey(guide = FALSE) + theme_minimal()
ggsave(here::here("external/slides/figures/africa_capitals.png"), 
       width = 5, height = 4, dpi = 300, bg = "transparent")
```

---

## Functions
### Components

```r
function_name <- function(arg1, arg2 = 1:10, 
                          arg3 = ifelse(arg2 == 2, TRUE, FALSE)) {
  body
}
```

Three components of a function:
- `formals()`: arguments
- `body()`, the code, which returns the last object generated, unless specified with `return(x)`.
- `environment()`, function finds the values

Unnamed functions are **anonymous** functions. (Used in `*apply`)

---

Using `x` in a function does not change its global value.

```r
x <- 1:10
myfun <- function() {
  x * 10
}
myfun()
```

```
##  [1]  10  20  30  40  50  60  70  80  90 100
```

```r
myfun <- function(x) {
  x <- x * 10
  return(x)
}
x <- 10
myfun(x = 20)
```

```
## [1] 200
```

```r
x
```

```
## [1] 10
```

---
Each time you run `myfun`, a new function environment is created.

```r
myfun <- function(x) {
  x <- x * 10
  print(environment())
  return(x)
}
myfun(x)
```

```
## <environment: 0x7fefab0d0748>
```

```
## [1] 100
```

```r
myfun(x)
```

```
## <environment: 0x7fefaac253f8>
```

```
## [1] 100
```

---
## Global assignment.
Use `<<-` to change value of global variable within a function.

```r
a <- 10
myfun <- function(x) {
  a <<- x * 10   ## note <<- instead of <- 
  return(a)
}
myfun(5)
```

```
## [1] 50
```

```r
print(a)
```

```
## [1] 50
```

---

```r
library(dplyr)
```

```
## 
## Attaching package: 'dplyr'
```

```
## The following objects are masked from 'package:terra':
## 
##     intersect, union
```

```
## The following objects are masked from 'package:stats':
## 
##     filter, lag
```

```
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
```

```r
search()
```

```
##  [1] ".GlobalEnv"        "package:dplyr"     "package:terra"    
##  [4] "tools:rstudio"     "package:stats"     "package:graphics" 
##  [7] "package:grDevices" "package:utils"     "package:datasets" 
## [10] "package:methods"   "Autoloads"         "package:base"
```

```r
set.seed(1)
v <- 1:100 + runif(100, -10, 10)
f <- filter(v, rep(1 / 5, 5))
```

```
## Error in UseMethod("filter"): no applicable method for 'filter' applied to an object of class "c('double', 'numeric')"
```

---

```r
detach("package:dplyr", unload = TRUE)
search()
```

```
##  [1] ".GlobalEnv"        "package:terra"     "tools:rstudio"    
##  [4] "package:stats"     "package:graphics"  "package:grDevices"
##  [7] "package:utils"     "package:datasets"  "package:methods"  
## [10] "Autoloads"         "package:base"
```

```r
f <- filter(v, rep(1 / 5, 5))
```

```r
plot(1:100, v, type = "l")
lines(1:100, f, col = "red")
```
<img src="figures/ts-filter.png" width="45%" style="display: block; margin: auto;" />

---
## Useful functions

- `which` finds indices where a condition is true.

```r
v <- 10:15
print(v)
```

```
## [1] 10 11 12 13 14 15
```

```r
a <- which(v %% 3 == 0) ## subset to elements divisible by 3
print(a) ## shows indices where condition is true.
```

```
## [1] 3 6
```

---
## Useful functions
- `which.min` finds index of min value

```r
v <- sample(1:20, 10)
print(v)
```

```
##  [1] 12 16  1 13  5 15  6 18  9  7
```

```r
print(which.min(v)) # index of min value
```

```
## [1] 3
```

```r
print(which.max(v)) # index of max value
```

```
## [1] 8
```

---
## data.frame vs data.table vs. tibble
- all 2D structures.
- data.frame = Base R
- tibble = `tidyverse`
- data.table = fast.

For now, we'll stick to data.frame
---

## data.frame indexing

- `data.frame` uses the following to subset: `[*row conditions, *column conditions]`

```r
df <- data.frame(v1 = 1:5, v2 = 6:10)
rownames(df) <- LETTERS[1:5]
print(df)
```

```
##   v1 v2
## A  1  6
## B  2  7
## C  3  8
## D  4  9
## E  5 10
```

---

## data.frame indexing

- Index using names. 
- Empty index `[  ,  'v2]` means "keep all rows"

```r
df[,'v2'] ## column indexing
```

```
## [1]  6  7  8  9 10
```

```r
df[c("A", "B", "D"), ] ## row indexing
```

```
##   v1 v2
## A  1  6
## B  2  7
## D  4  9
```
---
## data.frame subset

- Logical subset

```r
df[df$v1 > 3, ] ## get observations (rows) where first column is larger than 3
```

```
##   v1 v2
## D  4  9
## E  5 10
```

---
## Control structures
### Branching

- Pay attention to `{ }` placement

```r
a <- 5
if(a > 10) {
  print("Greater than 10!")
} else {
  print("Less than or equal to 10")
}
```

```
## [1] "Less than or equal to 10"
```

---
### Looping

```r
b <- 1:3
for(i in b) print(i)
```

```
## [1] 1
## [1] 2
## [1] 3
```

```r
b <- 1:5
a <- 2
for(i in b){
  a <- 2 * a
  print(a)
}
```

```
## [1] 4
## [1] 8
## [1] 16
## [1] 32
## [1] 64
```
---

### *apply
- A special form of looping
- Intended for *applying* a function to data. Uses *anonymous* function.
- 3 main kinds: `sapply`, `lapply`, `apply`

---
### `sapply`
- `sapply` iterates over input and returns a vector.

```r
v <- 1:10
sapply(v, function(x) x + 10) ## adds 10 to each element in v.
```

```
##  [1] 11 12 13 14 15 16 17 18 19 20
```
Use `{ }` for more complicated functions. BUT be careful with order of `{ }`, `( )`

```r
v1 <- 1:10
v2 <- sapply(v1, function(x){
  y <- x^2 
  return(y)
}) #
print(v2)
```

```
##  [1]   1   4   9  16  25  36  49  64  81 100
```
---
### `sapply`
If you don't specify `return`, the last object created will be returned.

```r
v1 <- 1:10
v2 <- sapply(v1, function(x){
  y <- x^2  ## y will be returned
}) #
print(v2)
```

```
##  [1]   1   4   9  16  25  36  49  64  81 100
```
---

### `lapply`
- Similar to `sapply`, except final object is returned as `list`.
- Useful if you need to store more complex objects (data.frame, plot, raster etc.)

```r
v1 <- 1:10
v2 <- lapply(v1, function(x){
  y <- x^2  ## y will be returned
}) #
print(v2)
```

```
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 9
## 
## [[4]]
## [1] 16
## 
## [[5]]
## [1] 25
## 
## [[6]]
## [1] 36
## 
## [[7]]
## [1] 49
## 
## [[8]]
## [1] 64
## 
## [[9]]
## [1] 81
## 
## [[10]]
## [1] 100
```
---

### `apply`
- `apply` works well for 2D data, when you want to apply function over a row or column.

```r
v1 <- sample(1:100, 10)
v2 <- sample(1:100, 10)
DF <- data.frame(v1, v2) ## data frame columns will take names of vectors
DF
```

```
##    v1 v2
## 1  87  1
## 2  83 43
## 3  90 59
## 4  48 26
## 5  64 15
## 6  94 58
## 7  60 29
## 8  51 24
## 9  34 42
## 10 10 48
```
---

Use `apply` to get column max value. The index 2 means "apply function to columns".

```r
colMax <- apply(DF, 2, FUN = max)
colMax
```

```
## v1 v2 
## 94 59
```

---

Use `apply` to get row max value. The index 1 means "apply function to rows".

```r
rowMax <- apply(DF, 1, FUN = max)
rowMax
```

```
##  [1] 87 83 90 48 64 94 60 51 42 48
```

We can use `apply` or `sapply` to create a new column in a data frame.

```r
DF$rowMax <- apply(DF, 1, FUN = max)
DF
```

```
##    v1 v2 rowMax
## 1  87  1     87
## 2  83 43     83
## 3  90 59     90
## 4  48 26     48
## 5  64 15     64
## 6  94 58     94
## 7  60 29     60
## 8  51 24     51
## 9  34 42     42
## 10 10 48     48
```

---
## Data generation

Create the following:
- `dat`, a data.frame built from `V1`, `V2`, `V3`, and `V4`, where:
  - `V1` = 1:20
  - `V2` is a random sample between 1:100
  - `V3` is drawn from a random uniform distribution between 0 and 50     
  - `V4` is a random selection of the letters A-E
  - Use `set.seed(50)`
- Do this all at once (i.e. wrap the creation of V1-V4 in the `data.frame` call, precede it with `set.seed()`)

---

## Exercises

- Use a `for` to iterate over each row of `dat` and calculate it's `sum`
- Do the same with `lapply` and `sapply`
- Do the same using `rowSums`
- Select rows from `dat` containing the letter "E" in `V4`, and take the mean of values from the result in column `V3`