Nested for and else_if loop in R to create a new variable. Is there a better way to do this?

I am trying to get some input on how to make my code better/faster.

I have been working with sf objects lately, and I have to do a bunch of operations to figure out whether certain points I collected are within certain shape files and then run some correlations.

My input points are organized in a list of sf objects. For every sf object there is a data frame with a number of variables describing characteristics of the various points.

In particular, there is one variable describing whether the points were collected within or outside a certain geographic area (“IN” or “OUT”).

Unfortunately, when a point was collected in the field I had no way to test whether I really was within a specific polygon or not and I marked “IN” or “OUT” based on my personal experience and knowledge of the field site.

So, what I am doing now is empirically testing whether the points that I collected in the field are actually within the shapefile/polygon of interest. I have already figured out all the code to do so, and I am posting here a mock example to pick your brain and get insight on how I could make it faster/better.

My real data points are over 3000, organized in some 25 sf objects within a list. My mock example will have only 90 data points organized in 2 sf objects within a list

First I create 2 fake data frames and I transform them in 2 sf objects with specific CRS.

    df1 <- data.frame(id = c(paste("a_", 1:15, sep = ""), 
                            paste("b_", 1:15, sep = ""), 
                            paste("c_", 1:15, sep = "")),
                     species = rep("blabla", 45),
                     fondo = rep("pol_1", 45),
                     in_out = sample(c("in", "out"), 45, replace = T), #randomly assigning in or out variable
                     long = sample(1994764:1995265, 45, replace = T),
                     lat = sample(215659:216563, 45, replace = T))

    df_sf1 <- st_as_sf(df1,
             coords = c("long", "lat"),
             na.fail = FALSE,
             crs = "+proj=utm +zone=19 +datum=WGS84"
    )
    df2 <- data.frame(id = c(paste("a_", 1:15, sep = ""), 
                             paste("b_", 1:15, sep = ""), 
                             paste("c_", 1:15, sep = "")),
                      species = rep("blabla", 45),
                      in_out = sample(c("in", "out"), 45, replace = T),
                      long = sample(1994764:1995265, 45, replace = T),
                      lat = sample(215659:216563, 45, replace = T))

    df_sf2 <- st_as_sf(df2,
                       coords = c("long", "lat"),
                       na.fail = FALSE,
                       crs = "+proj=utm +zone=19 +datum=WGS84"
    )

Then I sample some points from the 2 dfs to create a couple of fake polygons that will substitute the shape files I am using in my real analysis. They will be transformed to sf objects as well.

    df_sf_polygon <- st_multipoint(st_coordinates(df_sf[c(29, 21, 27, 3, 38), ]))
    df_sf_polygon <- st_sf(geom = st_sfc(df_sf_polygon))
    df_sf_polygon <- st_set_crs(df_sf_polygon, "+proj=utm +zone=19 +datum=WGS84")
    df_sf_polygon <- st_cast(df_sf_polygon, "POLYGON")

    df_sf_polygon2 <- st_multipoint(st_coordinates(df_sf2[c(3, 21, 29), ]))
    df_sf_polygon2 <- st_sf(geom = st_sfc(df_sf_polygon2))
    df_sf_polygon2 <- st_set_crs(df_sf_polygon2, "+proj=utm +zone=19 +datum=WGS84")
    df_sf_polygon2 <- st_cast(df_sf_polygon2, "POLYGON")

I can plot the data so far to have a feel for what is going on.

   tm_shape(df_sf1) +
      tm_dots(col = "red",
      size = 0.8) +
      tm_shape(df_sf2) +
      tm_dots(col = "green",
      size = 0.8) +) +
      tm_shape(sp_df_sf_polygon) +
      tm_borders() +
      tm_shape(sp_df_sf_polygon2) +
      tm_borders()

As you can probably see, there are numerous pints that fall outside of the two polygons that I have (remember, this is a fake example, so it doesn’t matter if the polygons aren’t perfect or if they intersect with each other). So now I want to test if the information in the column in_out jibe with what really happened in the field. To do so I have written a nested for{} loop with if{} and else_if{} statements (in this mock example I am only going to use a for{} and an if{} statements).

The loop checks whether every single point falls within a certain polygon/shapefile. If it does the character string “IN” is added to a new variable called in_out_nest_2, if it doesn’t “OUT” is added.

    list_data_sf <- list(df_sf1 = df_sf1, df_sf2 = df_sf2) # organize sf objects in a list

And here is the loop!

    for (x in 1:length(names(list_data_sf))) {
      if (grepl("df_sf1", names(list_data_sf[x]), fixed = T) == T) {
        for (y in 1:length(list_data_sf[[x]]$id)) {
          if (list_data_sf[[x]]$id[y] %in% list_data_sf[[x]][df_sf_polygon, ]$id){
            list_data_sf[[x]]$in_out_nest_2[y] <- "IN"
          } else {
            list_data_sf[[x]]$in_out_nest_2[y] <- "OUT"
          }
        }
      } else {
        for (y in 1:length(list_data_sf[[x]]$id)) {
          if (list_data_sf[[x]]$id[y] %in% list_data_sf[[x]][df_sf_polygon2, ]$id){
            list_data_sf[[x]]$in_out_nest_2[y] <- "IN"
          } else {
            list_data_sf[[x]]$in_out_nest_2[y] <- "OUT"
          }
        }
      }
    }

The code seem to be working just fine, even though, with the data that I will be collecting in the future, it could become really slow. I am open/interested in suggestions to make this code faster/better.

Thanks for all the help!

P.S. Below is my sessionInfo()

    sessionInfo()
    #> R version 3.5.2 (2018-12-20)
    #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
    #> Running under: macOS  10.15.5
    #> 
    #> Matrix products: default
    #> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
    #> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
    #> 
    #> locale:
    #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    #> 
    #> attached base packages:
    #> [1] stats     graphics  grDevices utils     datasets  methods   base     
    #> 
    #>
    #> other attached packages:
    #>  [1] reprex_0.3.0        viridis_0.5.1       viridisLite_0.3.0  
    #>  [4] vegan_2.5-6         permute_0.9-5       units_0.6-3        
    #>  [7] forcats_0.5.0       stringr_1.4.0       purrr_0.3.4        
    #> [10] readr_1.3.1         tidyverse_1.3.0     tidyr_1.1.0        
    #> [13] tibble_3.0.1        tmap_3.0            spatstat_1.64-1    
    #> [16] rpart_4.1-15        nlme_3.1-148        spatstat.data_1.4-3
    #> [19] sf_0.9-4            scales_1.1.1        RMySQL_0.10.20     
    #> [22] DBI_1.1.0           Rmisc_1.5           rgdal_1.5-12       
    #> [25] readxl_1.3.1        plyr_1.8.6          moveVis_0.10.5     
    #> [28] lubridate_1.7.9     leaflet_2.0.3       lattice_0.20-41    
    #> [31] hms_0.5.3           hydroTSM_0.6-0      xts_0.12-0         
    #> [34] zoo_1.8-8           GISTools_0.7-4      rgeos_0.5-3        
    #> [37] MASS_7.3-51.6       RColorBrewer_1.1-2  maptools_1.0-1     
    #> [40] ggspatial_1.1.3     ggridges_0.5.2      ggrepel_0.8.2      
    #> [43] ggmap_3.0.0         ggplot2_3.3.2       dplyr_1.0.0        
    #> [46] dismo_1.1-4         raster_3.1-5        sp_1.4-2           
    #> [49] data.table_1.12.8   corrplot_0.84       colorspace_1.4-1   
    #> [52] chron_2.3-55        anytime_0.3.7
    #>      
    #> loaded via a namespace (and not attached):
    #>  [1] compiler_3.5.2  magrittr_1.5    tools_3.5.2     htmltools_0.5.0
    #>  [5] yaml_2.2.1      stringi_1.4.6   rmarkdown_2.3   highr_0.8      
    #>  [9] knitr_1.29      stringr_1.4.0   xfun_0.15       digest_0.6.25  
    #> [13] rlang_0.4.6     evaluate_0.14

Custom multivariate spread of variance

Recently I had to deal with reviewing some figures I prepared for a manuscript that my colleagues and I submitted for publication.

One figure in particular needed a little bit of “re-styling”. It was the representation of the multivariate-spread of variance using a Principal Component Analysis. My dataset was fairly complex with two species of large reptiles sampled across three years.

The default graphical output produced using plot() is shown below.

With a nice explanatory caption the image could work, but it probably would be best to try and make it from scratch. I struggled a while to figure out the appropriate code to make an image that could deliver the important and appropriate information. I eventually stumbled upon this post that gave me a bunch of good ideas on how to develop my own code to make better graphics.

The main idea is that I wanted to graphically represent the distinction between the two different species sampled across multiple years. I also wanted to highlight differences between sexes. Below is the code and thought process that I followed!

First, the data. To make the example reproducible I used dput() from my R! session and copied the data. The code block below creates two objects: cent and vect. It is just numbers. Feel free to use them to reproduce the example.

cent <- structure(list(grps = structure(c(1L, 7L, 4L, 9L, 2L, 8L, 5L, 10L, 3L, 6L, 11L), 
                          .Label = c("P.F.2012", "P.F.2014", "P.F.2015", 
                          "P.M.2012", "P.M.2014", "P.M.2015", "Y.F.2012", "Y.F.2014", "Y.M.2012", 
                          "Y.M.2014", "Y.M.2015"), class = "factor"), PCoA1 = c(-0.232308295213946, 
                          -0.139600907716025, -0.23871740447522, -0.13289458216504, 0.447510859027901, 
                          0.408846369396462, 0.421557222881589, 0.420000775027015, 0.126708509926023, 
                          0.162432592104704, 0.173235470911422), PCoA2 = c(-0.0767320953158099, 
                          -0.0309422674481247, -0.0595237554222932, 0.0521870006926146, 
                          -0.0196829955701272, -0.0966715610040982, -0.0116833317430039, 
                          -0.126017773437853, 0.472363514000923, 0.526616708002587, 0.410527734672885
                          )), row.names = c("P.F.2012", "Y.F.2012", "P.M.2012", "Y.M.2012", 
                          "P.F.2014", "Y.F.2014", "P.M.2014", "Y.M.2014", "P.F.2015", "P.M.2015", 
                          "Y.M.2015"), class = "data.frame")

vect <- structure(list(group = structure(c(3L, 4L, 1L, 2L, 2L, 2L, 1L, 
              3L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 4L, 4L, 3L, 3L, 1L, 
              3L, 2L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 4L, 3L, 1L, 3L, 3L, 3L, 3L, 
              3L, 3L, 2L, 4L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 
              1L, 1L, 3L, 3L, 4L, 4L, 4L, 2L, 2L, 3L, 3L, 1L, 3L, 3L, 4L, 3L, 
              1L, 3L, 1L, 4L, 3L, 4L, 3L, 4L, 2L, 4L, 2L, 1L, 3L, 4L, 3L, 4L, 
              3L, 4L, 4L, 3L, 1L, 3L, 2L, 3L, 4L, 1L, 4L, 2L, 2L, 2L, 1L, 4L, 
              2L, 2L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 
              3L, 2L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 1L, 3L, 1L, 
              4L, 3L, 3L, 2L, 2L, 1L, 3L, 1L, 2L, 2L, 4L, 4L, 2L, 2L, 1L, 8L, 
              8L, 6L, 8L, 6L, 7L, 5L, 7L, 7L, 8L, 8L, 7L, 7L, 7L, 7L, 5L, 7L, 
              7L, 7L, 7L, 8L, 5L, 8L, 8L, 7L, 5L, 7L, 7L, 8L, 8L, 5L, 5L, 5L, 
              5L, 6L, 6L, 8L, 5L, 8L, 8L, 8L, 8L, 5L, 5L, 6L, 5L, 6L, 8L, 8L, 
              7L, 5L, 5L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 6L, 9L, 10L, 11L, 9L, 
              11L, 11L, 9L, 11L, 11L, 11L, 10L, 11L, 11L, 10L, 11L, 10L, 10L), 
              .Label = c("P.F.2012", "Y.F.2012", "P.M.2012", "Y.M.2012", 
              "P.F.2014", "Y.F.2014", "P.M.2014", "Y.M.2014", "P.F.2015", "P.M.2015", 
              "Y.M.2015"), class = "factor"), PCoA1 = c(-0.230399563384464, 
              -0.134660213481351, -0.203754629089395, -0.256544903538633, -0.0963269236640878, 
              -0.0694337387298244, -0.281789671494329, -0.186386572449756, 
              -0.225180773660388, -0.171370571290936, -0.185937653177261, -0.0313349898106232, 
              -0.177414154493466, -0.286788004227091, -0.057096258535677, 0.0311506121862437, 
              0.00737219333238985, -0.23944399904694, 0.0250201000172532, -0.124498498733216, 
              -0.318666093918789, -0.263058322357503, -0.30437379108452, -0.317716787227452, 
              -0.22936858056337, -0.324020128865818, -0.156284741626089, -0.234488238994369, 
              -0.322947505553436, -0.269254541985182, -0.202171132788021, -0.184842644761263, 
              -0.195859234992588, -0.282295012488084, -0.207769950061687, -0.0432998408453922, 
              -0.278422767726366, -0.00425519745151709, -0.341252606885391, 
              -0.314974083712529, -0.322809684519165, -0.0261828937940813, 
              -0.111006443736141, -0.129545880345861, -0.144589704542049, -0.215927245615552, 
              -0.172530265813857, 0.0341339808656641, -0.301776947671195, -0.33372174237259, 
              -0.186333103181287, -0.197020579486089, -0.178341113926678, -0.277170972770639, 
              -0.250578381488708, -0.198668672218084, -0.260180040538445, -0.231282025822095, 
              -0.316097034833028, -0.168722470602506, -0.175171067223132, -0.214789024421525, 
              -0.215777252772746, -0.230687622399837, -0.216584634169675, -0.290464154917697, 
              -0.217674204709272, -0.309944103911973, -0.334205206599481, -0.110429776451595, 
              -0.26762240366662, -0.131455593967047, -0.168378287926015, -0.295041743014695, 
              -0.241407574594379, -0.230288558216916, -0.254831792720723, -0.124513765659242, 
              0.100275505577226, -0.134938432890682, -0.0558328735565296, -0.252848991866849, 
              -0.164579672506183, -0.197494806897518, -0.141098827666042, -0.254971309037345, 
              -0.167242702726603, -0.23236613623913, 0.0395174244388881, -0.239609836805462, 
              -0.276340875811544, -0.143108361747883, -0.211025748926658, -0.181045470349234, 
              -0.253458918772677, -0.173851745508372, -0.226500308336377, -0.125557875115246, 
              -0.100646606550761, 0.1292556268151, -0.190044071510094, -0.239993579830368, 
              -0.196185432695898, -0.169723453666052, 0.0840757691524504, -0.0828552732030694, 
              -0.18867602251596, -0.240262651678313, -0.240562707504407, -0.195571582299229, 
              -0.29368079508683, -0.189669520157007, -0.106469206939931, -0.159254877744035, 
              -0.252544984788576, -0.181045470349234, -0.251501044444335, -0.14873585609214, 
              -0.341409312007338, -0.29422201212156, 0.00583753495184317, 0.0101775937951025, 
              -0.233187528813313, -0.226864081321846, -0.244494472376726, -0.171210842923591, 
              -0.0595344383338595, -0.0571697905980911, -0.0342463649025507, 
              -0.289206711262372, -0.0433007380702916, -0.24157131439507, -0.181225503601252, 
              -0.298959942537002, -0.248514795038792, -0.279157389102156, -0.231035001544293, 
              -0.234516167629517, -0.0873790575474149, -0.114569468928722, 
              -0.309163789150021, -0.259807938377265, -0.15242487724223, -0.282032055003666, 
              -0.226296767992886, -0.0824717212910516, -0.228792585525858, 
              -0.315322705170076, -0.0386727609378934, -0.237770817011458, 
              0.348112407782663, 0.356305338806301, 0.259899300706282, 0.418475142444894, 
              0.363700298369635, 0.360103274061879, 0.396746129120811, 0.438573922603235, 
              0.382239976357167, 0.388295212367119, 0.300273822578281, 0.364027851104267, 
              0.360612509938702, 0.437527965244732, 0.47438425812642, 0.471126779343484, 
              0.405017558499106, 0.46774443014255, 0.442403233583862, 0.458968573092587, 
              0.464042031205564, 0.451745508543372, 0.466751018499985, 0.445124133569052, 
              0.481801433158007, 0.463260537321389, 0.400227866051992, 0.418509759984409, 
              0.454496511114008, 0.449220367124618, 0.386587934663972, 0.427091644994218, 
              0.450605988088211, 0.441802433298112, 0.384144699353354, 0.452138792538128, 
              0.483330562507924, 0.478196728279581, 0.466356471047664, 0.458880694980997, 
              0.445203812523939, 0.328511120556464, 0.451882416405062, 0.422039425876919, 
              0.434568234897284, 0.456046966754084, 0.429303988670518, 0.396413846996235, 
              0.47022068135355, 0.431215731274915, 0.47275070204514, 0.494238258356094, 
              0.447767018139506, 0.422635825516969, 0.409052142082599, 0.448541432328074, 
              0.435579336028336, 0.373702053711424, 0.470248046885876, 0.432263065254091, 
              0.105801697005539, 0.165491938888967, 0.20183149437323, 0.157063702812224, 
              0.191713328709327, 0.185909449591539, 0.117260129960307, 0.175617881152869, 
              0.113814659221755, 0.113311155284664, 0.156584334899961, 0.162156638432386, 
              0.123828653854679, 0.207413571886514, 0.290935977582346, 0.15476772878197, 
              0.127905386066107), PCoA2 = c(-0.0191341788343272, 0.125168726992728, 
              -0.202390361206006, 0.146490100247817, -0.200301849366383, -0.21201222640355, 
              -0.0494779672831398, -0.064867860513553, 0.216253685647168, 0.0867701091746775, 
              -0.0239106285841686, -0.219393777849861, -0.0230233856717296, 
              -0.0163292155724036, 0.0592152889677712, -0.151617651855222, 
              -0.0212007857869626, 0.115566927346534, -0.164139032263446, 0.0029531837582282, 
              -0.0794717197234781, -0.0293929147442873, -0.110911592901463, 
              -0.12927716953763, 0.117180336391695, -0.0149612458810882, -0.0960266562283222, 
              -0.049729047645075, -0.0682815196170832, -0.155403049889603, 
              -0.0959639620424382, 0.00136368493909292, 0.167474360903015, 
              -0.0890067395142755, -0.0659968504421273, -0.171984415395419, 
              -0.0226301121126082, -0.134899140182382, -0.0652732525645313, 
              -0.0141374766642671, 0.0221177088313266, -0.0277129408215808, 
              -0.0291571032251613, 0.0507403968305617, 0.0783047857485938, 
              0.0835416661450715, 0.125549464068238, -0.095888543364202, 0.023491497876642, 
              -0.0635018258483635, -0.0294487207421573, -0.108823584736146, 
              -0.108034914144809, -0.156157231183656, -0.103386745593316, -0.124905123364418, 
              -0.0133696028998138, 0.0250847436377295, -0.125441363122204, 
              0.131097655495749, 0.0836103428858159, 0.128893730469317, 0.167815421817788, 
              0.0777405296869226, 0.0168663114490739, -0.153183408017823, -0.0909348704469186, 
              -0.14021167293329, -0.0930419417310087, 0.0436877074394201, -0.00225319821767063, 
              0.0208146270029163, -0.00351947062266598, -0.16730817534396, 
              0.112557789748311, 0.0554538524404688, 0.181951585244428, 0.0759461647433655, 
              -0.151581431783219, 0.0096728144345489, 0.00330932981991135, 
              0.0960068455025288, -0.0641591706332058, -0.116160652954858, 
              0.147442288553806, 0.0501155209349927, 0.0275555597633417, 0.0888625671566081, 
              -0.0960489308344658, 0.199273481452986, -0.0189884667557609, 
              0.0618505397737743, -0.121028327612638, -0.0857532086504985, 
              -0.138181163935366, 0.0880040030562361, -0.11030465746869, 0.00878921664279682, 
              -0.199213478490711, -0.192215953774544, 0.022790698668874, 0.00383356262477344, 
              0.0935754276044847, -0.139033365702276, -0.103320887221478, 0.0690232335475765, 
              0.0472656020856467, 0.122565635077436, -0.140102298747715, 0.0641482609425624, 
              -0.176810035404832, -0.129649663515281, -0.0855188451064051, 
              -0.096270764279862, 0.109478651834638, -0.0857532086504985, -0.0858316607142512, 
              0.0368213601207682, -0.08334290200728, -0.0475226873282189, -0.145192106265334, 
              -0.11993047751714, -0.00120734838280895, 0.0575022964973828, 
              0.00454727511837668, -0.0280811381260746, 0.00247121743454203, 
              -0.0632716134030262, 0.124461113311219, -0.0761637363883439, 
              -0.0360108098776078, 0.0514130392932943, -0.0812002685815079, 
              -0.113227844995018, -0.0712238451169342, 0.134504778206784, -0.17693508834818, 
              -0.0970853669034015, -0.149542085154106, -0.00416243122148371, 
              -0.0811795737900006, -0.109000675404029, -0.0682005601517485, 
              0.0805657711155904, -0.00787090753563984, 0.0829178244917227, 
              0.110442089573397, 0.059352757695238, -0.171220731291405, -0.0127368984823977, 
              -0.0227265115960453, -0.0675998040040803, 0.0123777264850023, 
              -0.17625254924779, -0.0100481757634364, -0.033285894872971, -0.0577783326325101, 
              -0.0188681370367962, 0.0312342880777784, -0.098387985154546, 
              -0.0256414863685817, 0.0184863552430799, 0.081342134439549, 0.0550790041300113, 
              -0.0504704902398865, -0.0235293808461972, 0.00786459228718106, 
              0.00223796462343852, 0.030677240684701, -0.132949074151751, -0.128201691014533, 
              -0.0356569432107826, -0.176291205991176, -0.174661762085358, 
              -0.0515899060618532, -0.0981136270684708, -0.117463986256326, 
              -0.0706523168174752, -0.118822448801722, -0.126649486059274, 
              -0.0376663033777901, -0.0143437185846975, -0.0748295755480395, 
              -0.100429323740166, -0.0039755823248106, -0.160655309935462, 
              -0.195631974958725, -0.00593748743055577, -0.170527643205115, 
              -0.206854966976772, -0.123689486612512, -0.0830836612792189, 
              0.069991994970583, 0.048571476449164, -0.137897682849205, 0.0316577678958369, 
              -0.0239325410354453, -0.0893762813948878, -0.157903203693157, 
              0.0731082498062617, -0.0213416320051184, 0.0325948844591984, 
              -0.164645976312765, -0.179307495643697, -0.0786658467646943, 
              -0.00843473288236123, -0.164852727234645, 0.00726274039171153, 
              -0.185240419023864, -0.167149003041966, 0.510517771197614, 0.531555184470961, 
              0.457833699032652, 0.422918169057653, 0.433761403813814, 0.438221290363338, 
              0.483654601747502, 0.44751594154082, 0.457785577272894, 0.348138814437302, 
              0.488861204642494, 0.396585995233976, 0.440405884220264, 0.508177282257904, 
              0.274501006140909, 0.55809258587971, 0.546397282761865)), 
              class = "data.frame", row.names = c(NA, -227L))


I am not going to spend too much time describing what the two objects contain. Suffice to say that cent has the “coordinates” (PCoA1 and PCoA2) of centroid values calculated on the raw data stored in vect.

To help the graphical mapping of the data I first modified the data and add a couple of columns for shape and color of variables. I use a for(){} loop and a regular expression grep() with matching patterns to associate the right color/shape with the appropriate value.

# adding colors
# cent is rather small, only 11 entries
# I can specify by hand the various colors I want
cent$colour <- c("pink", "yellow", "deeppink", "gold",
                 "pink", "yellow", "deeppink", "gold",
                 "pink", "deeppink", "gold")
cent$shape <- NA

# I use a matching expression
# to add a shape factor in the right place
cent$shape[grep(".2012", cent$grps)] <- "a"
cent$shape[grep(".2014", cent$grps)] <- "b"
cent$shape[grep(".2015", cent$grps)] <- "c"

# adding empty variables that will be populated later
vect$v_PCoA1 <- NA
vect$v_PCoA2 <- NA
vect$colour <- NA
vect$shape <- NA

# Populating the empty variables with the 
# coordinates of centroids to draw segments
# in the plot 
for (y in 1:length(vect$group)) {
  for (i in 1:length(cent$grps)) {
    if (vect$group[y] == cent$grps[i]) {
      vect$v_PCoA1[y] <-  cent$PCoA1[i]
      vect$v_PCoA2[y] <-  cent$PCoA2[i]
    } 
  }
}

# One more matching expression
# to add a the appropriate color based on 
# species and sex

vect$colour[grep("P.M.", vect$group)] <- "deeppink"
vect$colour[grep("P.F.", vect$group)] <- "pink"
vect$colour[grep("Y.M.", vect$group)] <- "gold"
vect$colour[grep("Y.F.", vect$group)] <- "yellow"

vect$shape[grep(".2012", vect$group)] <- "a"
vect$shape[grep(".2014", vect$group)] <- "b"
vect$shape[grep(".2015", vect$group)] <- "c"

At this point I am almost there. The only one piece that is missing is to create a convex hull for each group of data of interest.

# Calculating the convex hull
# Basically this is a polygon 
hull_2012_1 <- vect[vect$group=="P.M.2012",1:3][chull(vect[vect$group=="P.M.2012",2:3]),]
hull_2012_2 <- vect[vect$group=="P.F.2012",1:3][chull(vect[vect$group=="P.F.2012",2:3]),]
hull_2012_3 <- vect[vect$group=="Y.M.2012",1:3][chull(vect[vect$group=="Y.M.2012",2:3]),]
hull_2012_4 <- vect[vect$group=="Y.F.2012",1:3][chull(vect[vect$group=="Y.F.2012",2:3]),]
hull_2014_1 <- vect[vect$group=="P.M.2014",1:3][chull(vect[vect$group=="P.M.2014",2:3]),]
hull_2014_2 <- vect[vect$group=="P.F.2014",1:3][chull(vect[vect$group=="P.F.2014",2:3]),]
hull_2014_3 <- vect[vect$group=="Y.M.2014",1:3][chull(vect[vect$group=="Y.M.2014",2:3]),]
hull_2014_4 <- vect[vect$group=="Y.F.2014",1:3][chull(vect[vect$group=="Y.F.2014",2:3]),]
hull_2015_1 <- vect[vect$group=="P.M.2015",1:3][chull(vect[vect$group=="P.M.2015",2:3]),]
hull_2015_2 <- vect[vect$group=="P.F.2015",1:3][chull(vect[vect$group=="P.F.2015",2:3]),]
hull_2015_3 <- vect[vect$group=="Y.M.2015",1:3][chull(vect[vect$group=="Y.M.2015",2:3]),]

Now I have everything I need and I can just use ggplot() to make my graph.

ggplot() +
  geom_polygon(data = hull_2012_1, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2012_2, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2012_3, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2012_4, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2014_1, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2014_2, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2014_3, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2014_4, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2015_1, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2015_2, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2015_3, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_segment(data = vect, aes(x=PCoA1, xend = v_PCoA1, y = PCoA2, yend = v_PCoA2),
               alpha = 0.3) +
  geom_point(data = vect, aes(x = PCoA1, y = PCoA2, fill = vect$colour, shape = vect$shape),
             colour = "black", size = 3, stroke = .6, alpha = 0.9,) +
  geom_point(data = cent, aes(x = PCoA1, y = PCoA2, fill = cent$colour, shape = cent$shape),
             colour = "black", size = 6, stroke = 2) + 
  coord_cartesian(xlim = c(-0.4, 0.5), ylim = c(-0.2, 0.6)) +
  scale_fill_manual(name = "Species/Sex",
                    guide = guide_legend(override.aes = list(shape = 22, size = 5)),
                    values = levels(factor(vect$colour)),
                    #breaks = levels(factor(vect$colour)),
                    labels = c("Male pink", "Male yellow",
                               "Female pink", "Female yellow")
  ) +
  scale_shape_manual(name = "Year",
                     guide = guide_legend(override.aes = list(fill = "black", size = 5)),
                     values = c(21, 22, 23),
                     labels = c("2012", "2014", "2015")
  ) 

How Molecular Ecologists Work: Hanna Kokko on tending her literature garden and learning by reviewing —

Location: University of Zurich Current Position: professor of evolutionary ecology What kind of research do you? Evolutionary ecology Can you use one word to describe the way you work? Multitaskingly. Is that a word? What specific strategies do you recommend for running (or establishing) a lab? Remember that the people you’re training are not clones…

via How Molecular Ecologists Work: Hanna Kokko on tending her literature garden and learning by reviewing —

How Molecular Ecologists Work: Daniel Cadena on the eclectic approach and getting the country view —

Welcome to “How Molecular Ecologists Work”, the interview series that asks scientists how they get stuff done. This week, I’m interviewing Carlos Daniel Cadena, Professor at Universidad de los Andes in Colombia. Daniel’s work cuts across a broad swath of evolution and ecology, but mostly focuses on the evolution and systematics of neotropical birds. I asked…

via How Molecular Ecologists Work: Daniel Cadena on the eclectic approach and getting the country view —

How Molecular Ecologists Work: Chris Jiggins on organic collaboration and family gardening —

Welcome to “How Molecular Ecologists Work”, the interview series that asks scientists how they get stuff done. This week’s interview is with Dr. Chris Jiggins from the University of Cambridge. Chris and his colleagues broadly study the adaptation and speciation of butterflies and moths. However, I’ve heard that if you look in a mirror and say…

via How Molecular Ecologists Work: Chris Jiggins on organic collaboration and family gardening —

How Molecular Ecologists Work: Katy Heath on being an expert sleeper and not over-analyzing —

Welcome to “How Molecular Ecologists Work”, the interview series that asks scientists how they get stuff done. This week’s interview is from Dr. Katy Heath from the Department of Plant Biology and the University of Illinois. Katy and her team study the evolutionary ecology of mutualisms using plant, fungal, and bacterial systems. Location: Urbana-Champaign IL Current…

via How Molecular Ecologists Work: Katy Heath on being an expert sleeper and not over-analyzing —

How Molecular Ecologists Work: Craig Primmer on solving problems on the bike and in the sauna —

Location: University of Helsinki, Finland (moved there this May after 12 years at the University of Turku, Finland) Current Position: Professor of Genomics (on leave of absence whilst holding an Academy of Finland research professorship) Current mobile device(s): Fairphone, iPad, Suunto Spartan Sport watch Current computer(s): HP EliteBook What kind of research do you? Evolutionary…

via How Molecular Ecologists Work: Craig Primmer on solving problems on the bike and in the sauna —

Jesus told me "Come forth, and you will have eternal life!" But I came fifth, and I got a toaster…