Finalizing formatting of functions practice session.

NCEAS · Feb 29, 2024 · e433acb · e433acb
1 parent 07eaeb8
commit e433acb
Show file tree

Hide file tree

Showing 2 changed files with 45 additions and 9 deletions.
diff --git a/materials/sections/r-practice-function-cleaning-data.qmd b/materials/sections/r-practice-function-cleaning-data.qmd
@@ -77,7 +77,7 @@ This is a handy package that requires a moderate amount of knowledge of `html` t
 
 ::: callout-note
 ## Read and explore data
-Read in each data file and store the data frame as `shorebird_adult` and `shorebird_chick` accordingly. After reading the data, insert a new chunk or in the console, explore the data using any function we have used during the lessons (eg. `colname()`, `glimpse()`)
+Read in each data file and store the data frame as `nest_data`, `predator_survey`, and `egg_measures` accordingly. After reading the data, insert a new chunk or in the console, explore the data using any function we have used during the lessons (eg. `colname()`, `glimpse()`)
 
 ::: 
 
@@ -134,7 +134,7 @@ predator_comm_names <- left_join(predator_survey,
 ## Write a functions to add species common name to any data frame.
 How can you generalize the code from the previous question and make it into a function?
 
-The idea is that you can use this function in any data frame that has a `species` column with the Bird Banding Laboratory Species Code.
+The idea is that you can use this function in **any data frame** that has a column named `species` with the Bird Banding Laboratory Species Code.
 
 :::
 
@@ -184,36 +184,70 @@ assign_species_name <- function(df, species){
 ::: callout-note
 ## Use your function to clean names of each data frame
 
-Create clean versions of the three data frames by applying the function you created and removing columns that you think are note necessary and filter out `NA` values.
+Create clean versions of the three data frames by applying the function you created and removing columns that you think are note necessary(aka selecting the ones you want to keep) and filter out `NA` values.
 
 :::
 
 ```{r}
 #| code-summary: "Answer"
  
 ## This is one solution. 
-
 predator_clean <- assign_species_name(predator_survey, species) %>% 
     select(year, site, date, common_name, count) %>% 
     filter(!is.na(common_name))
 
 nest_location_clean <- assign_species_name(nest_data, species) %>% 
-    select(year, site, nestID, common_name, lat_corrected, long_corrected)
+    select(year, site, nestID, common_name, lat_corrected, long_corrected) %>% 
+    filter(!is.na(common_name))
 
 eggs_clean <- assign_species_name(egg_measures, species) %>% 
-    select(year, site, nestID, common_name, length, width)
+    select(year, site, nestID, common_name, length, width) %>% 
+    filter(!is.na(common_name))
+
 ```
 
 Congrats! Now you have clean data sets ready for analysis.
 
-## Challenge
+## Optional Challenge
 
 ::: callout-note
 ## Challenge
 
-**Optional Extra Challenge**: For a little extra challenge, try to incorporate an `if` statement that looks for `NA` values in the common name field you are adding. What other conditionals might you include to make your function smarter?
+For a little extra challenge, try to incorporate an `if` statement that looks for `NA` values in the common name field you are adding. What other conditionals might you include to make your function smarter?
 :::
 
+```{r}
+#| code-summary: "Answer"
+
+#' Function to add common name to data.frame according to the BBL list of species codes
+
+#' @param df A data frame containing BBL species codes in column `species`
+#' @param species A data frame defining BBL species codes with columns `alpha_code` and `common_name`
+#' @return A data frame with original data df, plus the common name of species
+
+assign_species_name <- function(df, species){
+    if (!("alpha_code" %in% names(species)) |
+        !("species" %in% names(df)) |
+        !("common_name" %in% names(species))){
+      stop("Tables appear to be formatted incorrectly.")
+    }  
+  
+    return_df <- left_join(df, species, by = c("species" = "alpha_code"))
+    
+    if (nrow(return_df) > nrow(df)){
+      warning("Joined table has more rows than original table. Check species table for duplicated code values.")
+    }
+    
+    if (length(which(is.na(return_df$common_name))) > 0){
+      x <- length(which(is.na(return_df$common_name)))
+      warning(paste("Common name has", x, "rows containing NA"))
+    }
+    
+    return(return_df)
+        
+}
+
+```
 
 
 
@@ -224,7 +258,7 @@ Congrats! Now you have clean data sets ready for analysis.
 You will likely at some point realize that the function we asked you to write is pretty simple. The code can in fact be accomplished in a single line. So why write your own function for this? There are a couple of answers. The first and most obvious is that we want to you practice writing function syntax with simple examples. But there are other reasons why this operation might benefit from a function:
 
 * Follow the DRY principles!
-    - If you find yourself doing the same cleaning steps on many of your data files, over and over again, those operations are good candidates for functions. This falls into that category, since we need to do the same transformation on both of the files we use here, and if we incorporated more files from this dataset it would come in even more use.
+    - If you find yourself doing the same cleaning steps on many of your data files, over and over again, those operations are good candidates for functions. This falls into that category, since we need to do the same transformation on all of the files we use here, and if we incorporated more files from the dataset it would come in even more use.
 * Add custom warnings and quality control.
     - Functions allow you to incorporate quality control through conditional statements coupled with warnings. Instead of checking for NA's or duplicated rows after you run a join, you can check within the function and return a warning if any are found.
 * Check your function input more carefully

diff --git a/materials/session_17.qmd b/materials/session_17.qmd
@@ -14,5 +14,7 @@ format:
 
 
 
+
+
 {{< include /sections/r-practice-function-cleaning-data.qmd >}}