Skip to content

Commit

Permalink
Finalizing formatting of functions practice session.
Browse files Browse the repository at this point in the history
  • Loading branch information
camilavargasp committed Feb 29, 2024
1 parent 07eaeb8 commit e433acb
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 9 deletions.
52 changes: 43 additions & 9 deletions materials/sections/r-practice-function-cleaning-data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ This is a handy package that requires a moderate amount of knowledge of `html` t

::: callout-note
## Read and explore data
Read in each data file and store the data frame as `shorebird_adult` and `shorebird_chick` accordingly. After reading the data, insert a new chunk or in the console, explore the data using any function we have used during the lessons (eg. `colname()`, `glimpse()`)
Read in each data file and store the data frame as `nest_data`, `predator_survey`, and `egg_measures` accordingly. After reading the data, insert a new chunk or in the console, explore the data using any function we have used during the lessons (eg. `colname()`, `glimpse()`)

:::

Expand Down Expand Up @@ -134,7 +134,7 @@ predator_comm_names <- left_join(predator_survey,
## Write a functions to add species common name to any data frame.
How can you generalize the code from the previous question and make it into a function?

The idea is that you can use this function in any data frame that has a `species` column with the Bird Banding Laboratory Species Code.
The idea is that you can use this function in **any data frame** that has a column named `species` with the Bird Banding Laboratory Species Code.

:::

Expand Down Expand Up @@ -184,36 +184,70 @@ assign_species_name <- function(df, species){
::: callout-note
## Use your function to clean names of each data frame

Create clean versions of the three data frames by applying the function you created and removing columns that you think are note necessary and filter out `NA` values.
Create clean versions of the three data frames by applying the function you created and removing columns that you think are note necessary(aka selecting the ones you want to keep) and filter out `NA` values.

:::

```{r}
#| code-summary: "Answer"
## This is one solution.
predator_clean <- assign_species_name(predator_survey, species) %>%
select(year, site, date, common_name, count) %>%
filter(!is.na(common_name))
nest_location_clean <- assign_species_name(nest_data, species) %>%
select(year, site, nestID, common_name, lat_corrected, long_corrected)
select(year, site, nestID, common_name, lat_corrected, long_corrected) %>%
filter(!is.na(common_name))
eggs_clean <- assign_species_name(egg_measures, species) %>%
select(year, site, nestID, common_name, length, width)
select(year, site, nestID, common_name, length, width) %>%
filter(!is.na(common_name))
```

Congrats! Now you have clean data sets ready for analysis.

## Challenge
## Optional Challenge

::: callout-note
## Challenge

**Optional Extra Challenge**: For a little extra challenge, try to incorporate an `if` statement that looks for `NA` values in the common name field you are adding. What other conditionals might you include to make your function smarter?
For a little extra challenge, try to incorporate an `if` statement that looks for `NA` values in the common name field you are adding. What other conditionals might you include to make your function smarter?
:::

```{r}
#| code-summary: "Answer"
#' Function to add common name to data.frame according to the BBL list of species codes
#' @param df A data frame containing BBL species codes in column `species`
#' @param species A data frame defining BBL species codes with columns `alpha_code` and `common_name`
#' @return A data frame with original data df, plus the common name of species
assign_species_name <- function(df, species){
if (!("alpha_code" %in% names(species)) |
!("species" %in% names(df)) |
!("common_name" %in% names(species))){
stop("Tables appear to be formatted incorrectly.")
}
return_df <- left_join(df, species, by = c("species" = "alpha_code"))
if (nrow(return_df) > nrow(df)){
warning("Joined table has more rows than original table. Check species table for duplicated code values.")
}
if (length(which(is.na(return_df$common_name))) > 0){
x <- length(which(is.na(return_df$common_name)))
warning(paste("Common name has", x, "rows containing NA"))
}
return(return_df)
}
```



Expand All @@ -224,7 +258,7 @@ Congrats! Now you have clean data sets ready for analysis.
You will likely at some point realize that the function we asked you to write is pretty simple. The code can in fact be accomplished in a single line. So why write your own function for this? There are a couple of answers. The first and most obvious is that we want to you practice writing function syntax with simple examples. But there are other reasons why this operation might benefit from a function:

* Follow the DRY principles!
- If you find yourself doing the same cleaning steps on many of your data files, over and over again, those operations are good candidates for functions. This falls into that category, since we need to do the same transformation on both of the files we use here, and if we incorporated more files from this dataset it would come in even more use.
- If you find yourself doing the same cleaning steps on many of your data files, over and over again, those operations are good candidates for functions. This falls into that category, since we need to do the same transformation on all of the files we use here, and if we incorporated more files from the dataset it would come in even more use.
* Add custom warnings and quality control.
- Functions allow you to incorporate quality control through conditional statements coupled with warnings. Instead of checking for NA's or duplicated rows after you run a join, you can check within the function and return a warning if any are found.
* Check your function input more carefully
Expand Down
2 changes: 2 additions & 0 deletions materials/session_17.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,7 @@ format:





{{< include /sections/r-practice-function-cleaning-data.qmd >}}

0 comments on commit e433acb

Please sign in to comment.