Geospatial Analysis with R

class: center, middle, inverse, title-slide

.title[
# Geospatial Analysis with R
]
.subtitle[
## Class 11
]

---

---

```r
library(dplyr)
library(ggplot2)
library(sf)
tz <- st_read(here::here("external/data/tanzania.geojson"))
sagcot <- st_read(here::here("external/data/sagcotcl.geojson"))
zambia <- st_read(
  system.file("extdata/districts.geojson", package = "geospaar")
) %>% st_union()
prec <- geodata::worldclim_global(var = "prec", res = 2.5, 
                                  path = "external/data/")
precsum <- terra::app(prec, sum)
tz_sagcot <- terra::vect(st_union(sagcot, zambia))
prec_tzzam <- terra::mask(terra::crop(precsum, tz_sagcot), tz_sagcot)

prec_stars <- stars::st_as_stars(prec_tzzam)
p <- ggplot() + 
  geom_sf(data = tz) +  
  stars::geom_stars(data = prec_stars) +
  scale_fill_viridis_c(name = "Rainfall (mm)", na.value = "transparent") + 
  geom_sf(data = zambia, fill = "transparent") +
  geom_sf(data = sagcot, fill = "transparent") +
  labs(x = NULL, y = NULL) +
  theme_linedraw()
ggsave(p, filename = "docs/figures/tanzam_rainfall.png", height = 4, 
       width = 7, units = "in", dpi = 300)  
```

---
## Reading and writing data

### File paths

Let's read in a csv a few different ways.

Full path - clear for you, bad for code sharing.

```r
data_tib <- read.csv(
  "/Users/lestes/Dropbox/teaching/geog246346/geospaar/inst/extdata/cdf_corn.csv"
)
str(data_tib)
```

```
## 'data.frame':	5999 obs. of  20 variables:
##  $ tenant      : chr  "planet" "planet" "planet" "planet" ...
##  $ site_id     : chr  "UNLEN_CSP1_IMZ1" "UNLEN_CSP1_IMZ1" "UNLEN_CSP1_IMZ1" "UNLEN_CSP1_IMZ1" ...
##  $ local       : chr  "2020-06-05 10:00:00-05:00" "2020-06-06 10:00:00-05:00" "2020-06-07 10:00:00-05:00" "2020-06-08 10:00:00-05:00" ...
##  $ lat         : num  41.2 41.2 41.2 41.2 41.2 ...
##  $ long        : num  -96.5 -96.5 -96.5 -96.5 -96.5 ...
##  $ site_group  : chr  "Lincoln" "Lincoln" "Lincoln" "Lincoln" ...
##  $ Tair        : num  27.4 25.5 29.6 27.7 26.7 ...
##  $ Tabove      : num  27.93 17.43 9.18 8.58 18.71 ...
##  $ Tbelow      : num  28 24.3 27 25.9 26.6 ...
##  $ CGDD        : int  345 363 381 393 402 414 425 439 455 471 ...
##  $ b1r         : num  0.098 0.052 0.0493 0.0484 0.0452 ...
##  $ b2r         : num  0.0822 0.0762 0.0739 0.0722 0.0676 ...
##  $ b3r         : num  0.2013 0.0911 0.0863 0.0848 0.0803 ...
##  $ b4r         : num  0.1447 0.1001 0.0991 0.0958 0.0899 ...
##  $ b5r         : num  0.0946 0.225 0.2378 0.246 0.2434 ...
##  $ b6r         : num  0.139 0.282 0.332 0.348 0.334 ...
##  $ b7r         : num  0.123 0.233 0.259 0.265 0.259 ...
##  $ NDVI        : num  -0.0209 0.4754 0.54 0.5686 0.5755 ...
##  $ CropType    : chr  "Corn" "Corn" "Corn" "Corn" ...
##  $ PlantingDate: chr  "04/20/2020" "04/20/2020" "04/20/2020" "04/20/2020" ...
```

---

```r
data_tib <- readr::read_csv(
  "/Users/lestes/Dropbox/teaching/geog246346/geospaar/inst/extdata/cdf_corn.csv"
)
```

```
## Rows: 5999 Columns: 20
## ── Column specification ────────────────────────────────────────────
## Delimiter: ","
## chr   (5): tenant, site_id, site_group, CropType, PlantingDate
## dbl  (14): lat, long, Tair, Tabove, Tbelow, CGDD, b1r, b2r, b3r,...
## dttm  (1): local
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```r
str(data_tib)
```

```
## spc_tbl_ [5,999 × 20] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ tenant      : chr [1:5999] "planet" "planet" "planet" "planet" ...
##  $ site_id     : chr [1:5999] "UNLEN_CSP1_IMZ1" "UNLEN_CSP1_IMZ1" "UNLEN_CSP1_IMZ1" "UNLEN_CSP1_IMZ1" ...
##  $ local       : POSIXct[1:5999], format: "2020-06-05 15:00:00" ...
##  $ lat         : num [1:5999] 41.2 41.2 41.2 41.2 41.2 ...
##  $ long        : num [1:5999] -96.5 -96.5 -96.5 -96.5 -96.5 ...
##  $ site_group  : chr [1:5999] "Lincoln" "Lincoln" "Lincoln" "Lincoln" ...
##  $ Tair        : num [1:5999] 27.4 25.5 29.6 27.7 26.7 ...
##  $ Tabove      : num [1:5999] 27.93 17.43 9.18 8.58 18.71 ...
##  $ Tbelow      : num [1:5999] 28 24.3 27 25.9 26.6 ...
##  $ CGDD        : num [1:5999] 345 363 381 393 402 414 425 439 455 471 ...
##  $ b1r         : num [1:5999] 0.098 0.052 0.0493 0.0484 0.0452 ...
##  $ b2r         : num [1:5999] 0.0822 0.0762 0.0739 0.0722 0.0676 ...
##  $ b3r         : num [1:5999] 0.2013 0.0911 0.0863 0.0848 0.0803 ...
##  $ b4r         : num [1:5999] 0.1447 0.1001 0.0991 0.0958 0.0899 ...
##  $ b5r         : num [1:5999] 0.0946 0.225 0.2378 0.246 0.2434 ...
##  $ b6r         : num [1:5999] 0.139 0.282 0.332 0.348 0.334 ...
##  $ b7r         : num [1:5999] 0.123 0.233 0.259 0.265 0.259 ...
##  $ NDVI        : num [1:5999] -0.0209 0.4754 0.54 0.5686 0.5755 ...
##  $ CropType    : chr [1:5999] "Corn" "Corn" "Corn" "Corn" ...
##  $ PlantingDate: chr [1:5999] "04/20/2020" "04/20/2020" "04/20/2020" "04/20/2020" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   tenant = col_character(),
##   ..   site_id = col_character(),
##   ..   local = col_datetime(format = ""),
##   ..   lat = col_double(),
##   ..   long = col_double(),
##   ..   site_group = col_character(),
##   ..   Tair = col_double(),
##   ..   Tabove = col_double(),
##   ..   Tbelow = col_double(),
##   ..   CGDD = col_double(),
##   ..   b1r = col_double(),
##   ..   b2r = col_double(),
##   ..   b3r = col_double(),
##   ..   b4r = col_double(),
##   ..   b5r = col_double(),
##   ..   b6r = col_double(),
##   ..   b7r = col_double(),
##   ..   NDVI = col_double(),
##   ..   CropType = col_character(),
##   ..   PlantingDate = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
```

---
## Working directory `"."`

- Working directory. Use `getwd()` (from console)  
- Usually set to project folder.

```r
getwd() ## if in an RMD, this will show the folder of the RMD
```

```
## [1] "/Users/lestes/Dropbox/teaching/geog246346/geospaar/docs"
```

Use `.` to start a file path from the working directory

```r
list.files(".") ## 
```

```r
data_tib <- readr::read_csv("./inst/extdata/cdf_corn.csv")
```

- Use ".." to go up one folder level

```r
list.files(".") ## files in working directory
list.files("..") ## files in folder one level up
```
---
## User directory `"~"`

- Set by environment variable
- Use command below to see value

```r
path.expand("~")
```

```
## [1] "/Users/lestes"
```

```r
data_tib <- readr::read_csv(
  "~/Dropbox/teaching/geog246346/geospaar/inst/extdata/cdf_corn.csv"
)
```

---
## Writing files

- Use `write.csv` or `readr::write_csv` to write

```r
readr::write_csv(data_tib, file = "temp.csv") ## by default writes to wd()
```

---
## Saving/loading files
- If you want to save an R object, like a `data.frame`, `tibble` etc.
- Use save, and `.rda` extension

```r
save(data_tib, file = "temp.rda") ## by default writes to wd()
```

```r
data_tib <- NULL
load(file = "temp.rda") ## loads file back to environment
```

---
## Dates with `lubridate`

- The main function you want to use is `as_date`, which can convert a character to date format.

```r
library(lubridate)
date1 <- as_date("2022-03-01") ## date in standard YYYY-MM-DD format
print(date1)
```

```
## [1] "2022-03-01"
```
---
## Dates with `lubridate`

- More challenging with unclear date formats.

```r
date2 <- as_date("3/1/22") ## is month or date first?
```

```
## Warning: All formats failed to parse. No formats found.
```

```r
date2
```

```
## [1] NA
```

Include format as shown below. See [formats in this link](https://epirhandbook.com/en/working-with-dates.html)

```r
date2 <- as_date("3/1/22", format = "%m/%d/%y" )
date2
```

```
## [1] "2022-03-01"
```

We can also write dates in desired format

```r
date2_char <- as.character(date2, format = "%A %B %d, %Y")
date2_char
```

```
## [1] "Tuesday March 01, 2022"
```
---
## Date formats

<img src="figures/class10_date_formats.png" width="100%" style="display: block; margin: auto;" />
---
How can we read in this date?

```r
date3 <- as_date("Apr 3, 1999", format = "...")
date3
```

```
## [1] NA
```
---

## Exercises

- Use `lapply` to make three `data.frame`s captured in a list `l`, each composed of one randomly sampled column `v1` (selecting from integers 1:10, with length = 20), and the second being `v2` composed of lowercase letters, randomly selected using `sample`, also of length 20.  
- The iterator in the `lapply` should be 10, 20, 30, which become the random seeds for the sampling (in the body of the `lapply`)
- After making `l`, use a `for` loop to iterate through each element of `l`, writing each out to a folder `external/data/` in your project. 
- Change the name of each as part of the iteration, so that `l[[1]]` is written out as `external/data/dataset1.csv`, etc. Hint: you can use `paste0` to make each file path and name.  
- After writing these out, use another `lapply` to read back in the three datasets into a new list `l2`.  Bonus: Use `dir` to programmatically read in the file paths from your `external/data` folder.