Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: multiple grouping variables #279

Open
okenk opened this issue Nov 10, 2021 · 3 comments
Open

Feature request: multiple grouping variables #279

okenk opened this issue Nov 10, 2021 · 3 comments
Labels

Comments

@okenk
Copy link

okenk commented Nov 10, 2021

Hi, thanks so much for the package, it's a huge help! I am cross-posting this issue on the stan forums in case there is a solution I am missing. Essentially I am trying to visualize the posterior predictive distributions across two grouping variables instead of one. I am using bayesplot::ppc_violin_grouped() which is working like a charm for the first grouping variable, but then I'd like to facet by the second grouping variable.

I thought I might be able to just add this to an existing plot using facet_wrap(). I looked at the data component of the bayesplot object and first tried to emulate that. However, I got an error that the length was wrong. If I then change the length of the facet variable to what the error asks for, it then wants a different length. Minimally reproducible example:

mod <- brms::brm(mpg ~ cyl + gear, data = mtcars, iter = 200)
ppc <- brms::posterior_predict(mod)
base.plot <- bayesplot::ppc_violin_grouped(y = mtcars$mpg, yrep = ppc, group = mtcars$gear) 

# try to replicate structure of base.plot$data
base.plot +
  facet_wrap(~c(rep(mtcars$cyl, each = 400), mtcars$cyl))

# change based on resulting error
base.plot +
  facet_wrap(~rep(mtcars$cyl, each = 400))

# change based on resulting error
base.plot +
  facet_wrap(~mtcars$cyl)
@jgabry jgabry added the feature label Nov 11, 2021
@jgabry
Copy link
Member

jgabry commented Nov 11, 2021

Yeah unfortunately it's not straightforward to add a second grouping variable right now, but this is definitely a feature I'd like to add at some point. Until then, I think you can kind of hack it. Here's one way:

library(bayesplot)
library(dplyr)  # will use %>% and left_join

# - use the ppc_data function to get the data baseplot used internally (this is different than the data 
# constructed by ggplot2 in base.plot$data)
# - add the second grouping variable (in this case cyl) and join by y_id
# - in the new data frame plot_data now has a variable "group2" now (it already had a variable "group")
plot_data <- 
  ppc_data(y = mtcars$mpg, yrep = ppc, group = mtcars$gear) %>%
  left_join(data.frame(group2 = mtcars$cyl, y_id = 1:nrow(mtcars)), by = "y_id")

head(plot_data)
# A tibble: 6 × 9
  group  y_id y_name rep_id rep_label            is_y  is_y_label     value group2
  <fct> <int> <fct>   <int> <fct>                <lgl> <fct>          <dbl>  <dbl>
1 4         1 1           1 italic(y)[rep] ( 1 ) FALSE italic(y)[rep]  21.4      6
2 4         1 1           2 italic(y)[rep] ( 2 ) FALSE italic(y)[rep]  17.3      6
3 4         1 1           3 italic(y)[rep] ( 3 ) FALSE italic(y)[rep]  18.3      6
4 4         1 1           4 italic(y)[rep] ( 4 ) FALSE italic(y)[rep]  18.2      6
5 4         1 1           5 italic(y)[rep] ( 5 ) FALSE italic(y)[rep]  22.2      6
6 4         1 1           6 italic(y)[rep] ( 6 ) FALSE italic(y)[rep]  20.6      6

Now we can replace the data in base.plot with plot_data and then you should be able to use facet_wrap or facet_grid with two variables:

# - use %+% to replace the data in base.plot with plot_data
# - then add facet_grid or facet_wrap (I'm not sure which you want here, maybe facet_grid actually)
base.plot %+% plot_data + facet_grid(c("group", "group2"))

All together the code is:

plot_data <- 
  ppc_data(y = mtcars$mpg, yrep = ppc, group = mtcars$gear) %>%
  left_join(data.frame(group2 = mtcars$cyl, y_id = 1:nrow(mtcars)), by = "y_id")
base.plot %+% plot_data + facet_wrap(c("group", "group2"))

There may be a better way to do this, but I think this should at least work. In the future hopefully we can add an argument to the plotting function for a second grouping variable or allow a different way to specify multiple grouping variables.

@okenk
Copy link
Author

okenk commented Nov 12, 2021

Thanks! Would be a cool feature for diagnosing model misspecifications, but in the meantime this worked.

@shirdekel
Copy link

@jgabry How would you do the same but for ppc_bars_grouped? I'm struggling to join the second grouping variable because I can't get an ID for each rep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants