Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Cleaning vs Exploratory #9

Open
qihan-z opened this issue Mar 23, 2022 · 1 comment
Open

Data Cleaning vs Exploratory #9

qihan-z opened this issue Mar 23, 2022 · 1 comment

Comments

@qihan-z
Copy link

qihan-z commented Mar 23, 2022

Hi Dr.McGowan,

I'm using your tidycode pkg for my independent study. I used it on one of the R scripts I have written in tidyverse syntax and compare the result to my (eye-balled) classification. There is one discrepancy where I would classify the functions as "Exploratory" rather than "Data Cleaning," which is what the tidycode package gave. I recreated those lines and replaced the dataset with the built-in dataset mtcars and obtained the same results (that the used functions such as summarize() and mean() are classified as Data Cleaning rather than exploratory):

library(tidyverse)
data(mtcars)

mtcars %>% summarize(mean(hp, na.rm = TRUE))
mtcars %>% group_by(cyl) %>% summarize(mean(wt, na.rm = TRUE))

Does the package classify all dplyr functions to be Data Cleaning? Is there any way we can remedy this? Thank you.

@LucyMcGowan
Copy link
Owner

LucyMcGowan commented Mar 23, 2022

The classifications are based on crowd sourced classification (the "score" is the proportion of classifications that gave a specific function that class) -- you can create your own classification lexicon and apply that if you would like for a specific purpose. It doesn't classify all dplyr functions as "Data Cleaning", but the method for classification could definitely be improved! One idea would be to have various "lexicons" based on the context. Currently we only have two (one that was crowd sourced from the "general public" (people who participated mostly recruited via twitter) and the other was by members of Jeff Leek's lab at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants