Category Archives: R!

Nested for and else_if loop in R to create a new variable. Is there a better way to do this?

I am trying to get some input on how to make my code better/faster.

I have been working with sf objects lately, and I have to do a bunch of operations to figure out whether certain points I collected are within certain shape files and then run some correlations.

My input points are organized in a list of sf objects. For every sf object there is a data frame with a number of variables describing characteristics of the various points.

In particular, there is one variable describing whether the points were collected within or outside a certain geographic area (“IN” or “OUT”).

Unfortunately, when a point was collected in the field I had no way to test whether I really was within a specific polygon or not and I marked “IN” or “OUT” based on my personal experience and knowledge of the field site.

So, what I am doing now is empirically testing whether the points that I collected in the field are actually within the shapefile/polygon of interest. I have already figured out all the code to do so, and I am posting here a mock example to pick your brain and get insight on how I could make it faster/better.

My real data points are over 3000, organized in some 25 sf objects within a list. My mock example will have only 90 data points organized in 2 sf objects within a list

First I create 2 fake data frames and I transform them in 2 sf objects with specific CRS.

    df1 <- data.frame(id = c(paste("a_", 1:15, sep = ""), 
                            paste("b_", 1:15, sep = ""), 
                            paste("c_", 1:15, sep = "")),
                     species = rep("blabla", 45),
                     fondo = rep("pol_1", 45),
                     in_out = sample(c("in", "out"), 45, replace = T), #randomly assigning in or out variable
                     long = sample(1994764:1995265, 45, replace = T),
                     lat = sample(215659:216563, 45, replace = T))

    df_sf1 <- st_as_sf(df1,
             coords = c("long", "lat"),
             na.fail = FALSE,
             crs = "+proj=utm +zone=19 +datum=WGS84"
    )
    df2 <- data.frame(id = c(paste("a_", 1:15, sep = ""), 
                             paste("b_", 1:15, sep = ""), 
                             paste("c_", 1:15, sep = "")),
                      species = rep("blabla", 45),
                      in_out = sample(c("in", "out"), 45, replace = T),
                      long = sample(1994764:1995265, 45, replace = T),
                      lat = sample(215659:216563, 45, replace = T))

    df_sf2 <- st_as_sf(df2,
                       coords = c("long", "lat"),
                       na.fail = FALSE,
                       crs = "+proj=utm +zone=19 +datum=WGS84"
    )

Then I sample some points from the 2 dfs to create a couple of fake polygons that will substitute the shape files I am using in my real analysis. They will be transformed to sf objects as well.

    df_sf_polygon <- st_multipoint(st_coordinates(df_sf[c(29, 21, 27, 3, 38), ]))
    df_sf_polygon <- st_sf(geom = st_sfc(df_sf_polygon))
    df_sf_polygon <- st_set_crs(df_sf_polygon, "+proj=utm +zone=19 +datum=WGS84")
    df_sf_polygon <- st_cast(df_sf_polygon, "POLYGON")

    df_sf_polygon2 <- st_multipoint(st_coordinates(df_sf2[c(3, 21, 29), ]))
    df_sf_polygon2 <- st_sf(geom = st_sfc(df_sf_polygon2))
    df_sf_polygon2 <- st_set_crs(df_sf_polygon2, "+proj=utm +zone=19 +datum=WGS84")
    df_sf_polygon2 <- st_cast(df_sf_polygon2, "POLYGON")

I can plot the data so far to have a feel for what is going on.

   tm_shape(df_sf1) +
      tm_dots(col = "red",
      size = 0.8) +
      tm_shape(df_sf2) +
      tm_dots(col = "green",
      size = 0.8) +) +
      tm_shape(sp_df_sf_polygon) +
      tm_borders() +
      tm_shape(sp_df_sf_polygon2) +
      tm_borders()

As you can probably see, there are numerous pints that fall outside of the two polygons that I have (remember, this is a fake example, so it doesn’t matter if the polygons aren’t perfect or if they intersect with each other). So now I want to test if the information in the column in_out jibe with what really happened in the field. To do so I have written a nested for{} loop with if{} and else_if{} statements (in this mock example I am only going to use a for{} and an if{} statements).

The loop checks whether every single point falls within a certain polygon/shapefile. If it does the character string “IN” is added to a new variable called in_out_nest_2, if it doesn’t “OUT” is added.

    list_data_sf <- list(df_sf1 = df_sf1, df_sf2 = df_sf2) # organize sf objects in a list

And here is the loop!

    for (x in 1:length(names(list_data_sf))) {
      if (grepl("df_sf1", names(list_data_sf[x]), fixed = T) == T) {
        for (y in 1:length(list_data_sf[[x]]$id)) {
          if (list_data_sf[[x]]$id[y] %in% list_data_sf[[x]][df_sf_polygon, ]$id){
            list_data_sf[[x]]$in_out_nest_2[y] <- "IN"
          } else {
            list_data_sf[[x]]$in_out_nest_2[y] <- "OUT"
          }
        }
      } else {
        for (y in 1:length(list_data_sf[[x]]$id)) {
          if (list_data_sf[[x]]$id[y] %in% list_data_sf[[x]][df_sf_polygon2, ]$id){
            list_data_sf[[x]]$in_out_nest_2[y] <- "IN"
          } else {
            list_data_sf[[x]]$in_out_nest_2[y] <- "OUT"
          }
        }
      }
    }

The code seem to be working just fine, even though, with the data that I will be collecting in the future, it could become really slow. I am open/interested in suggestions to make this code faster/better.

Thanks for all the help!

P.S. Below is my sessionInfo()

    sessionInfo()
    #> R version 3.5.2 (2018-12-20)
    #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
    #> Running under: macOS  10.15.5
    #> 
    #> Matrix products: default
    #> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
    #> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
    #> 
    #> locale:
    #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    #> 
    #> attached base packages:
    #> [1] stats     graphics  grDevices utils     datasets  methods   base     
    #> 
    #>
    #> other attached packages:
    #>  [1] reprex_0.3.0        viridis_0.5.1       viridisLite_0.3.0  
    #>  [4] vegan_2.5-6         permute_0.9-5       units_0.6-3        
    #>  [7] forcats_0.5.0       stringr_1.4.0       purrr_0.3.4        
    #> [10] readr_1.3.1         tidyverse_1.3.0     tidyr_1.1.0        
    #> [13] tibble_3.0.1        tmap_3.0            spatstat_1.64-1    
    #> [16] rpart_4.1-15        nlme_3.1-148        spatstat.data_1.4-3
    #> [19] sf_0.9-4            scales_1.1.1        RMySQL_0.10.20     
    #> [22] DBI_1.1.0           Rmisc_1.5           rgdal_1.5-12       
    #> [25] readxl_1.3.1        plyr_1.8.6          moveVis_0.10.5     
    #> [28] lubridate_1.7.9     leaflet_2.0.3       lattice_0.20-41    
    #> [31] hms_0.5.3           hydroTSM_0.6-0      xts_0.12-0         
    #> [34] zoo_1.8-8           GISTools_0.7-4      rgeos_0.5-3        
    #> [37] MASS_7.3-51.6       RColorBrewer_1.1-2  maptools_1.0-1     
    #> [40] ggspatial_1.1.3     ggridges_0.5.2      ggrepel_0.8.2      
    #> [43] ggmap_3.0.0         ggplot2_3.3.2       dplyr_1.0.0        
    #> [46] dismo_1.1-4         raster_3.1-5        sp_1.4-2           
    #> [49] data.table_1.12.8   corrplot_0.84       colorspace_1.4-1   
    #> [52] chron_2.3-55        anytime_0.3.7
    #>      
    #> loaded via a namespace (and not attached):
    #>  [1] compiler_3.5.2  magrittr_1.5    tools_3.5.2     htmltools_0.5.0
    #>  [5] yaml_2.2.1      stringi_1.4.6   rmarkdown_2.3   highr_0.8      
    #>  [9] knitr_1.29      stringr_1.4.0   xfun_0.15       digest_0.6.25  
    #> [13] rlang_0.4.6     evaluate_0.14

Custom multivariate spread of variance

Recently I had to deal with reviewing some figures I prepared for a manuscript that my colleagues and I submitted for publication.

One figure in particular needed a little bit of “re-styling”. It was the representation of the multivariate-spread of variance using a Principal Component Analysis. My dataset was fairly complex with two species of large reptiles sampled across three years.

The default graphical output produced using plot() is shown below.

With a nice explanatory caption the image could work, but it probably would be best to try and make it from scratch. I struggled a while to figure out the appropriate code to make an image that could deliver the important and appropriate information. I eventually stumbled upon this post that gave me a bunch of good ideas on how to develop my own code to make better graphics.

The main idea is that I wanted to graphically represent the distinction between the two different species sampled across multiple years. I also wanted to highlight differences between sexes. Below is the code and thought process that I followed!

First, the data. To make the example reproducible I used dput() from my R! session and copied the data. The code block below creates two objects: cent and vect. It is just numbers. Feel free to use them to reproduce the example.

cent <- structure(list(grps = structure(c(1L, 7L, 4L, 9L, 2L, 8L, 5L, 10L, 3L, 6L, 11L), 
                          .Label = c("P.F.2012", "P.F.2014", "P.F.2015", 
                          "P.M.2012", "P.M.2014", "P.M.2015", "Y.F.2012", "Y.F.2014", "Y.M.2012", 
                          "Y.M.2014", "Y.M.2015"), class = "factor"), PCoA1 = c(-0.232308295213946, 
                          -0.139600907716025, -0.23871740447522, -0.13289458216504, 0.447510859027901, 
                          0.408846369396462, 0.421557222881589, 0.420000775027015, 0.126708509926023, 
                          0.162432592104704, 0.173235470911422), PCoA2 = c(-0.0767320953158099, 
                          -0.0309422674481247, -0.0595237554222932, 0.0521870006926146, 
                          -0.0196829955701272, -0.0966715610040982, -0.0116833317430039, 
                          -0.126017773437853, 0.472363514000923, 0.526616708002587, 0.410527734672885
                          )), row.names = c("P.F.2012", "Y.F.2012", "P.M.2012", "Y.M.2012", 
                          "P.F.2014", "Y.F.2014", "P.M.2014", "Y.M.2014", "P.F.2015", "P.M.2015", 
                          "Y.M.2015"), class = "data.frame")

vect <- structure(list(group = structure(c(3L, 4L, 1L, 2L, 2L, 2L, 1L, 
              3L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 4L, 4L, 3L, 3L, 1L, 
              3L, 2L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 4L, 3L, 1L, 3L, 3L, 3L, 3L, 
              3L, 3L, 2L, 4L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 
              1L, 1L, 3L, 3L, 4L, 4L, 4L, 2L, 2L, 3L, 3L, 1L, 3L, 3L, 4L, 3L, 
              1L, 3L, 1L, 4L, 3L, 4L, 3L, 4L, 2L, 4L, 2L, 1L, 3L, 4L, 3L, 4L, 
              3L, 4L, 4L, 3L, 1L, 3L, 2L, 3L, 4L, 1L, 4L, 2L, 2L, 2L, 1L, 4L, 
              2L, 2L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 
              3L, 2L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 1L, 3L, 1L, 
              4L, 3L, 3L, 2L, 2L, 1L, 3L, 1L, 2L, 2L, 4L, 4L, 2L, 2L, 1L, 8L, 
              8L, 6L, 8L, 6L, 7L, 5L, 7L, 7L, 8L, 8L, 7L, 7L, 7L, 7L, 5L, 7L, 
              7L, 7L, 7L, 8L, 5L, 8L, 8L, 7L, 5L, 7L, 7L, 8L, 8L, 5L, 5L, 5L, 
              5L, 6L, 6L, 8L, 5L, 8L, 8L, 8L, 8L, 5L, 5L, 6L, 5L, 6L, 8L, 8L, 
              7L, 5L, 5L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 6L, 9L, 10L, 11L, 9L, 
              11L, 11L, 9L, 11L, 11L, 11L, 10L, 11L, 11L, 10L, 11L, 10L, 10L), 
              .Label = c("P.F.2012", "Y.F.2012", "P.M.2012", "Y.M.2012", 
              "P.F.2014", "Y.F.2014", "P.M.2014", "Y.M.2014", "P.F.2015", "P.M.2015", 
              "Y.M.2015"), class = "factor"), PCoA1 = c(-0.230399563384464, 
              -0.134660213481351, -0.203754629089395, -0.256544903538633, -0.0963269236640878, 
              -0.0694337387298244, -0.281789671494329, -0.186386572449756, 
              -0.225180773660388, -0.171370571290936, -0.185937653177261, -0.0313349898106232, 
              -0.177414154493466, -0.286788004227091, -0.057096258535677, 0.0311506121862437, 
              0.00737219333238985, -0.23944399904694, 0.0250201000172532, -0.124498498733216, 
              -0.318666093918789, -0.263058322357503, -0.30437379108452, -0.317716787227452, 
              -0.22936858056337, -0.324020128865818, -0.156284741626089, -0.234488238994369, 
              -0.322947505553436, -0.269254541985182, -0.202171132788021, -0.184842644761263, 
              -0.195859234992588, -0.282295012488084, -0.207769950061687, -0.0432998408453922, 
              -0.278422767726366, -0.00425519745151709, -0.341252606885391, 
              -0.314974083712529, -0.322809684519165, -0.0261828937940813, 
              -0.111006443736141, -0.129545880345861, -0.144589704542049, -0.215927245615552, 
              -0.172530265813857, 0.0341339808656641, -0.301776947671195, -0.33372174237259, 
              -0.186333103181287, -0.197020579486089, -0.178341113926678, -0.277170972770639, 
              -0.250578381488708, -0.198668672218084, -0.260180040538445, -0.231282025822095, 
              -0.316097034833028, -0.168722470602506, -0.175171067223132, -0.214789024421525, 
              -0.215777252772746, -0.230687622399837, -0.216584634169675, -0.290464154917697, 
              -0.217674204709272, -0.309944103911973, -0.334205206599481, -0.110429776451595, 
              -0.26762240366662, -0.131455593967047, -0.168378287926015, -0.295041743014695, 
              -0.241407574594379, -0.230288558216916, -0.254831792720723, -0.124513765659242, 
              0.100275505577226, -0.134938432890682, -0.0558328735565296, -0.252848991866849, 
              -0.164579672506183, -0.197494806897518, -0.141098827666042, -0.254971309037345, 
              -0.167242702726603, -0.23236613623913, 0.0395174244388881, -0.239609836805462, 
              -0.276340875811544, -0.143108361747883, -0.211025748926658, -0.181045470349234, 
              -0.253458918772677, -0.173851745508372, -0.226500308336377, -0.125557875115246, 
              -0.100646606550761, 0.1292556268151, -0.190044071510094, -0.239993579830368, 
              -0.196185432695898, -0.169723453666052, 0.0840757691524504, -0.0828552732030694, 
              -0.18867602251596, -0.240262651678313, -0.240562707504407, -0.195571582299229, 
              -0.29368079508683, -0.189669520157007, -0.106469206939931, -0.159254877744035, 
              -0.252544984788576, -0.181045470349234, -0.251501044444335, -0.14873585609214, 
              -0.341409312007338, -0.29422201212156, 0.00583753495184317, 0.0101775937951025, 
              -0.233187528813313, -0.226864081321846, -0.244494472376726, -0.171210842923591, 
              -0.0595344383338595, -0.0571697905980911, -0.0342463649025507, 
              -0.289206711262372, -0.0433007380702916, -0.24157131439507, -0.181225503601252, 
              -0.298959942537002, -0.248514795038792, -0.279157389102156, -0.231035001544293, 
              -0.234516167629517, -0.0873790575474149, -0.114569468928722, 
              -0.309163789150021, -0.259807938377265, -0.15242487724223, -0.282032055003666, 
              -0.226296767992886, -0.0824717212910516, -0.228792585525858, 
              -0.315322705170076, -0.0386727609378934, -0.237770817011458, 
              0.348112407782663, 0.356305338806301, 0.259899300706282, 0.418475142444894, 
              0.363700298369635, 0.360103274061879, 0.396746129120811, 0.438573922603235, 
              0.382239976357167, 0.388295212367119, 0.300273822578281, 0.364027851104267, 
              0.360612509938702, 0.437527965244732, 0.47438425812642, 0.471126779343484, 
              0.405017558499106, 0.46774443014255, 0.442403233583862, 0.458968573092587, 
              0.464042031205564, 0.451745508543372, 0.466751018499985, 0.445124133569052, 
              0.481801433158007, 0.463260537321389, 0.400227866051992, 0.418509759984409, 
              0.454496511114008, 0.449220367124618, 0.386587934663972, 0.427091644994218, 
              0.450605988088211, 0.441802433298112, 0.384144699353354, 0.452138792538128, 
              0.483330562507924, 0.478196728279581, 0.466356471047664, 0.458880694980997, 
              0.445203812523939, 0.328511120556464, 0.451882416405062, 0.422039425876919, 
              0.434568234897284, 0.456046966754084, 0.429303988670518, 0.396413846996235, 
              0.47022068135355, 0.431215731274915, 0.47275070204514, 0.494238258356094, 
              0.447767018139506, 0.422635825516969, 0.409052142082599, 0.448541432328074, 
              0.435579336028336, 0.373702053711424, 0.470248046885876, 0.432263065254091, 
              0.105801697005539, 0.165491938888967, 0.20183149437323, 0.157063702812224, 
              0.191713328709327, 0.185909449591539, 0.117260129960307, 0.175617881152869, 
              0.113814659221755, 0.113311155284664, 0.156584334899961, 0.162156638432386, 
              0.123828653854679, 0.207413571886514, 0.290935977582346, 0.15476772878197, 
              0.127905386066107), PCoA2 = c(-0.0191341788343272, 0.125168726992728, 
              -0.202390361206006, 0.146490100247817, -0.200301849366383, -0.21201222640355, 
              -0.0494779672831398, -0.064867860513553, 0.216253685647168, 0.0867701091746775, 
              -0.0239106285841686, -0.219393777849861, -0.0230233856717296, 
              -0.0163292155724036, 0.0592152889677712, -0.151617651855222, 
              -0.0212007857869626, 0.115566927346534, -0.164139032263446, 0.0029531837582282, 
              -0.0794717197234781, -0.0293929147442873, -0.110911592901463, 
              -0.12927716953763, 0.117180336391695, -0.0149612458810882, -0.0960266562283222, 
              -0.049729047645075, -0.0682815196170832, -0.155403049889603, 
              -0.0959639620424382, 0.00136368493909292, 0.167474360903015, 
              -0.0890067395142755, -0.0659968504421273, -0.171984415395419, 
              -0.0226301121126082, -0.134899140182382, -0.0652732525645313, 
              -0.0141374766642671, 0.0221177088313266, -0.0277129408215808, 
              -0.0291571032251613, 0.0507403968305617, 0.0783047857485938, 
              0.0835416661450715, 0.125549464068238, -0.095888543364202, 0.023491497876642, 
              -0.0635018258483635, -0.0294487207421573, -0.108823584736146, 
              -0.108034914144809, -0.156157231183656, -0.103386745593316, -0.124905123364418, 
              -0.0133696028998138, 0.0250847436377295, -0.125441363122204, 
              0.131097655495749, 0.0836103428858159, 0.128893730469317, 0.167815421817788, 
              0.0777405296869226, 0.0168663114490739, -0.153183408017823, -0.0909348704469186, 
              -0.14021167293329, -0.0930419417310087, 0.0436877074394201, -0.00225319821767063, 
              0.0208146270029163, -0.00351947062266598, -0.16730817534396, 
              0.112557789748311, 0.0554538524404688, 0.181951585244428, 0.0759461647433655, 
              -0.151581431783219, 0.0096728144345489, 0.00330932981991135, 
              0.0960068455025288, -0.0641591706332058, -0.116160652954858, 
              0.147442288553806, 0.0501155209349927, 0.0275555597633417, 0.0888625671566081, 
              -0.0960489308344658, 0.199273481452986, -0.0189884667557609, 
              0.0618505397737743, -0.121028327612638, -0.0857532086504985, 
              -0.138181163935366, 0.0880040030562361, -0.11030465746869, 0.00878921664279682, 
              -0.199213478490711, -0.192215953774544, 0.022790698668874, 0.00383356262477344, 
              0.0935754276044847, -0.139033365702276, -0.103320887221478, 0.0690232335475765, 
              0.0472656020856467, 0.122565635077436, -0.140102298747715, 0.0641482609425624, 
              -0.176810035404832, -0.129649663515281, -0.0855188451064051, 
              -0.096270764279862, 0.109478651834638, -0.0857532086504985, -0.0858316607142512, 
              0.0368213601207682, -0.08334290200728, -0.0475226873282189, -0.145192106265334, 
              -0.11993047751714, -0.00120734838280895, 0.0575022964973828, 
              0.00454727511837668, -0.0280811381260746, 0.00247121743454203, 
              -0.0632716134030262, 0.124461113311219, -0.0761637363883439, 
              -0.0360108098776078, 0.0514130392932943, -0.0812002685815079, 
              -0.113227844995018, -0.0712238451169342, 0.134504778206784, -0.17693508834818, 
              -0.0970853669034015, -0.149542085154106, -0.00416243122148371, 
              -0.0811795737900006, -0.109000675404029, -0.0682005601517485, 
              0.0805657711155904, -0.00787090753563984, 0.0829178244917227, 
              0.110442089573397, 0.059352757695238, -0.171220731291405, -0.0127368984823977, 
              -0.0227265115960453, -0.0675998040040803, 0.0123777264850023, 
              -0.17625254924779, -0.0100481757634364, -0.033285894872971, -0.0577783326325101, 
              -0.0188681370367962, 0.0312342880777784, -0.098387985154546, 
              -0.0256414863685817, 0.0184863552430799, 0.081342134439549, 0.0550790041300113, 
              -0.0504704902398865, -0.0235293808461972, 0.00786459228718106, 
              0.00223796462343852, 0.030677240684701, -0.132949074151751, -0.128201691014533, 
              -0.0356569432107826, -0.176291205991176, -0.174661762085358, 
              -0.0515899060618532, -0.0981136270684708, -0.117463986256326, 
              -0.0706523168174752, -0.118822448801722, -0.126649486059274, 
              -0.0376663033777901, -0.0143437185846975, -0.0748295755480395, 
              -0.100429323740166, -0.0039755823248106, -0.160655309935462, 
              -0.195631974958725, -0.00593748743055577, -0.170527643205115, 
              -0.206854966976772, -0.123689486612512, -0.0830836612792189, 
              0.069991994970583, 0.048571476449164, -0.137897682849205, 0.0316577678958369, 
              -0.0239325410354453, -0.0893762813948878, -0.157903203693157, 
              0.0731082498062617, -0.0213416320051184, 0.0325948844591984, 
              -0.164645976312765, -0.179307495643697, -0.0786658467646943, 
              -0.00843473288236123, -0.164852727234645, 0.00726274039171153, 
              -0.185240419023864, -0.167149003041966, 0.510517771197614, 0.531555184470961, 
              0.457833699032652, 0.422918169057653, 0.433761403813814, 0.438221290363338, 
              0.483654601747502, 0.44751594154082, 0.457785577272894, 0.348138814437302, 
              0.488861204642494, 0.396585995233976, 0.440405884220264, 0.508177282257904, 
              0.274501006140909, 0.55809258587971, 0.546397282761865)), 
              class = "data.frame", row.names = c(NA, -227L))


I am not going to spend too much time describing what the two objects contain. Suffice to say that cent has the “coordinates” (PCoA1 and PCoA2) of centroid values calculated on the raw data stored in vect.

To help the graphical mapping of the data I first modified the data and add a couple of columns for shape and color of variables. I use a for(){} loop and a regular expression grep() with matching patterns to associate the right color/shape with the appropriate value.

# adding colors
# cent is rather small, only 11 entries
# I can specify by hand the various colors I want
cent$colour <- c("pink", "yellow", "deeppink", "gold",
                 "pink", "yellow", "deeppink", "gold",
                 "pink", "deeppink", "gold")
cent$shape <- NA

# I use a matching expression
# to add a shape factor in the right place
cent$shape[grep(".2012", cent$grps)] <- "a"
cent$shape[grep(".2014", cent$grps)] <- "b"
cent$shape[grep(".2015", cent$grps)] <- "c"

# adding empty variables that will be populated later
vect$v_PCoA1 <- NA
vect$v_PCoA2 <- NA
vect$colour <- NA
vect$shape <- NA

# Populating the empty variables with the 
# coordinates of centroids to draw segments
# in the plot 
for (y in 1:length(vect$group)) {
  for (i in 1:length(cent$grps)) {
    if (vect$group[y] == cent$grps[i]) {
      vect$v_PCoA1[y] <-  cent$PCoA1[i]
      vect$v_PCoA2[y] <-  cent$PCoA2[i]
    } 
  }
}

# One more matching expression
# to add a the appropriate color based on 
# species and sex

vect$colour[grep("P.M.", vect$group)] <- "deeppink"
vect$colour[grep("P.F.", vect$group)] <- "pink"
vect$colour[grep("Y.M.", vect$group)] <- "gold"
vect$colour[grep("Y.F.", vect$group)] <- "yellow"

vect$shape[grep(".2012", vect$group)] <- "a"
vect$shape[grep(".2014", vect$group)] <- "b"
vect$shape[grep(".2015", vect$group)] <- "c"

At this point I am almost there. The only one piece that is missing is to create a convex hull for each group of data of interest.

# Calculating the convex hull
# Basically this is a polygon 
hull_2012_1 <- vect[vect$group=="P.M.2012",1:3][chull(vect[vect$group=="P.M.2012",2:3]),]
hull_2012_2 <- vect[vect$group=="P.F.2012",1:3][chull(vect[vect$group=="P.F.2012",2:3]),]
hull_2012_3 <- vect[vect$group=="Y.M.2012",1:3][chull(vect[vect$group=="Y.M.2012",2:3]),]
hull_2012_4 <- vect[vect$group=="Y.F.2012",1:3][chull(vect[vect$group=="Y.F.2012",2:3]),]
hull_2014_1 <- vect[vect$group=="P.M.2014",1:3][chull(vect[vect$group=="P.M.2014",2:3]),]
hull_2014_2 <- vect[vect$group=="P.F.2014",1:3][chull(vect[vect$group=="P.F.2014",2:3]),]
hull_2014_3 <- vect[vect$group=="Y.M.2014",1:3][chull(vect[vect$group=="Y.M.2014",2:3]),]
hull_2014_4 <- vect[vect$group=="Y.F.2014",1:3][chull(vect[vect$group=="Y.F.2014",2:3]),]
hull_2015_1 <- vect[vect$group=="P.M.2015",1:3][chull(vect[vect$group=="P.M.2015",2:3]),]
hull_2015_2 <- vect[vect$group=="P.F.2015",1:3][chull(vect[vect$group=="P.F.2015",2:3]),]
hull_2015_3 <- vect[vect$group=="Y.M.2015",1:3][chull(vect[vect$group=="Y.M.2015",2:3]),]

Now I have everything I need and I can just use ggplot() to make my graph.

ggplot() +
  geom_polygon(data = hull_2012_1, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2012_2, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2012_3, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2012_4, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2014_1, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2014_2, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2014_3, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2014_4, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2015_1, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2015_2, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_polygon(data = hull_2015_3, aes(x = PCoA1, y = PCoA2), 
               colour = "black", alpha = 0, linetype = "dashed", size = .5) +
  geom_segment(data = vect, aes(x=PCoA1, xend = v_PCoA1, y = PCoA2, yend = v_PCoA2),
               alpha = 0.3) +
  geom_point(data = vect, aes(x = PCoA1, y = PCoA2, fill = vect$colour, shape = vect$shape),
             colour = "black", size = 3, stroke = .6, alpha = 0.9,) +
  geom_point(data = cent, aes(x = PCoA1, y = PCoA2, fill = cent$colour, shape = cent$shape),
             colour = "black", size = 6, stroke = 2) + 
  coord_cartesian(xlim = c(-0.4, 0.5), ylim = c(-0.2, 0.6)) +
  scale_fill_manual(name = "Species/Sex",
                    guide = guide_legend(override.aes = list(shape = 22, size = 5)),
                    values = levels(factor(vect$colour)),
                    #breaks = levels(factor(vect$colour)),
                    labels = c("Male pink", "Male yellow",
                               "Female pink", "Female yellow")
  ) +
  scale_shape_manual(name = "Year",
                     guide = guide_legend(override.aes = list(fill = "black", size = 5)),
                     values = c(21, 22, 23),
                     labels = c("2012", "2014", "2015")
  ) 

A Bayesian election prediction, implemented with R and Stan — R-bloggers

(This article was first published on Revolutions, and kindly contributed to R-bloggers) If the media coverage is anything to go by, people are desperate to know who will win the US election on November 8. Polls give us some indication of what’s likely to happen, but any single poll isn’t a great guide (despite the…

via A Bayesian election prediction, implemented with R and Stan — R-bloggers