Tag Archives: spatial features

Nested for and else_if loop in R to create a new variable. Is there a better way to do this?

I am trying to get some input on how to make my code better/faster.

I have been working with sf objects lately, and I have to do a bunch of operations to figure out whether certain points I collected are within certain shape files and then run some correlations.

My input points are organized in a list of sf objects. For every sf object there is a data frame with a number of variables describing characteristics of the various points.

In particular, there is one variable describing whether the points were collected within or outside a certain geographic area (“IN” or “OUT”).

Unfortunately, when a point was collected in the field I had no way to test whether I really was within a specific polygon or not and I marked “IN” or “OUT” based on my personal experience and knowledge of the field site.

So, what I am doing now is empirically testing whether the points that I collected in the field are actually within the shapefile/polygon of interest. I have already figured out all the code to do so, and I am posting here a mock example to pick your brain and get insight on how I could make it faster/better.

My real data points are over 3000, organized in some 25 sf objects within a list. My mock example will have only 90 data points organized in 2 sf objects within a list

First I create 2 fake data frames and I transform them in 2 sf objects with specific CRS.

    df1 <- data.frame(id = c(paste("a_", 1:15, sep = ""), 
                            paste("b_", 1:15, sep = ""), 
                            paste("c_", 1:15, sep = "")),
                     species = rep("blabla", 45),
                     fondo = rep("pol_1", 45),
                     in_out = sample(c("in", "out"), 45, replace = T), #randomly assigning in or out variable
                     long = sample(1994764:1995265, 45, replace = T),
                     lat = sample(215659:216563, 45, replace = T))

    df_sf1 <- st_as_sf(df1,
             coords = c("long", "lat"),
             na.fail = FALSE,
             crs = "+proj=utm +zone=19 +datum=WGS84"
    )
    df2 <- data.frame(id = c(paste("a_", 1:15, sep = ""), 
                             paste("b_", 1:15, sep = ""), 
                             paste("c_", 1:15, sep = "")),
                      species = rep("blabla", 45),
                      in_out = sample(c("in", "out"), 45, replace = T),
                      long = sample(1994764:1995265, 45, replace = T),
                      lat = sample(215659:216563, 45, replace = T))

    df_sf2 <- st_as_sf(df2,
                       coords = c("long", "lat"),
                       na.fail = FALSE,
                       crs = "+proj=utm +zone=19 +datum=WGS84"
    )

Then I sample some points from the 2 dfs to create a couple of fake polygons that will substitute the shape files I am using in my real analysis. They will be transformed to sf objects as well.

    df_sf_polygon <- st_multipoint(st_coordinates(df_sf[c(29, 21, 27, 3, 38), ]))
    df_sf_polygon <- st_sf(geom = st_sfc(df_sf_polygon))
    df_sf_polygon <- st_set_crs(df_sf_polygon, "+proj=utm +zone=19 +datum=WGS84")
    df_sf_polygon <- st_cast(df_sf_polygon, "POLYGON")

    df_sf_polygon2 <- st_multipoint(st_coordinates(df_sf2[c(3, 21, 29), ]))
    df_sf_polygon2 <- st_sf(geom = st_sfc(df_sf_polygon2))
    df_sf_polygon2 <- st_set_crs(df_sf_polygon2, "+proj=utm +zone=19 +datum=WGS84")
    df_sf_polygon2 <- st_cast(df_sf_polygon2, "POLYGON")

I can plot the data so far to have a feel for what is going on.

   tm_shape(df_sf1) +
      tm_dots(col = "red",
      size = 0.8) +
      tm_shape(df_sf2) +
      tm_dots(col = "green",
      size = 0.8) +) +
      tm_shape(sp_df_sf_polygon) +
      tm_borders() +
      tm_shape(sp_df_sf_polygon2) +
      tm_borders()

As you can probably see, there are numerous pints that fall outside of the two polygons that I have (remember, this is a fake example, so it doesn’t matter if the polygons aren’t perfect or if they intersect with each other). So now I want to test if the information in the column in_out jibe with what really happened in the field. To do so I have written a nested for{} loop with if{} and else_if{} statements (in this mock example I am only going to use a for{} and an if{} statements).

The loop checks whether every single point falls within a certain polygon/shapefile. If it does the character string “IN” is added to a new variable called in_out_nest_2, if it doesn’t “OUT” is added.

    list_data_sf <- list(df_sf1 = df_sf1, df_sf2 = df_sf2) # organize sf objects in a list

And here is the loop!

    for (x in 1:length(names(list_data_sf))) {
      if (grepl("df_sf1", names(list_data_sf[x]), fixed = T) == T) {
        for (y in 1:length(list_data_sf[[x]]$id)) {
          if (list_data_sf[[x]]$id[y] %in% list_data_sf[[x]][df_sf_polygon, ]$id){
            list_data_sf[[x]]$in_out_nest_2[y] <- "IN"
          } else {
            list_data_sf[[x]]$in_out_nest_2[y] <- "OUT"
          }
        }
      } else {
        for (y in 1:length(list_data_sf[[x]]$id)) {
          if (list_data_sf[[x]]$id[y] %in% list_data_sf[[x]][df_sf_polygon2, ]$id){
            list_data_sf[[x]]$in_out_nest_2[y] <- "IN"
          } else {
            list_data_sf[[x]]$in_out_nest_2[y] <- "OUT"
          }
        }
      }
    }

The code seem to be working just fine, even though, with the data that I will be collecting in the future, it could become really slow. I am open/interested in suggestions to make this code faster/better.

Thanks for all the help!

P.S. Below is my sessionInfo()

    sessionInfo()
    #> R version 3.5.2 (2018-12-20)
    #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
    #> Running under: macOS  10.15.5
    #> 
    #> Matrix products: default
    #> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
    #> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
    #> 
    #> locale:
    #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    #> 
    #> attached base packages:
    #> [1] stats     graphics  grDevices utils     datasets  methods   base     
    #> 
    #>
    #> other attached packages:
    #>  [1] reprex_0.3.0        viridis_0.5.1       viridisLite_0.3.0  
    #>  [4] vegan_2.5-6         permute_0.9-5       units_0.6-3        
    #>  [7] forcats_0.5.0       stringr_1.4.0       purrr_0.3.4        
    #> [10] readr_1.3.1         tidyverse_1.3.0     tidyr_1.1.0        
    #> [13] tibble_3.0.1        tmap_3.0            spatstat_1.64-1    
    #> [16] rpart_4.1-15        nlme_3.1-148        spatstat.data_1.4-3
    #> [19] sf_0.9-4            scales_1.1.1        RMySQL_0.10.20     
    #> [22] DBI_1.1.0           Rmisc_1.5           rgdal_1.5-12       
    #> [25] readxl_1.3.1        plyr_1.8.6          moveVis_0.10.5     
    #> [28] lubridate_1.7.9     leaflet_2.0.3       lattice_0.20-41    
    #> [31] hms_0.5.3           hydroTSM_0.6-0      xts_0.12-0         
    #> [34] zoo_1.8-8           GISTools_0.7-4      rgeos_0.5-3        
    #> [37] MASS_7.3-51.6       RColorBrewer_1.1-2  maptools_1.0-1     
    #> [40] ggspatial_1.1.3     ggridges_0.5.2      ggrepel_0.8.2      
    #> [43] ggmap_3.0.0         ggplot2_3.3.2       dplyr_1.0.0        
    #> [46] dismo_1.1-4         raster_3.1-5        sp_1.4-2           
    #> [49] data.table_1.12.8   corrplot_0.84       colorspace_1.4-1   
    #> [52] chron_2.3-55        anytime_0.3.7
    #>      
    #> loaded via a namespace (and not attached):
    #>  [1] compiler_3.5.2  magrittr_1.5    tools_3.5.2     htmltools_0.5.0
    #>  [5] yaml_2.2.1      stringi_1.4.6   rmarkdown_2.3   highr_0.8      
    #>  [9] knitr_1.29      stringr_1.4.0   xfun_0.15       digest_0.6.25  
    #> [13] rlang_0.4.6     evaluate_0.14