Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for handling censored data in PPC plots via imputation #319

Open
avehtari opened this issue Feb 5, 2024 · 2 comments
Open

Support for handling censored data in PPC plots via imputation #319

avehtari opened this issue Feb 5, 2024 · 2 comments

Comments

@avehtari
Copy link
Contributor

avehtari commented Feb 5, 2024

The only plot type supporting censored data currently is ppc_km_overlay (plus _grouped), which plots overlaid Kaplan-Meier plots, which is an excellent idea. Other plots can be useful, too, and we can provide a generic approach by considering the censored data as missing data, and generate imputed y.

Here's a simple code for the imputation (which doesn't check that the sample gets a non-zero number of draws)

yrep <- posterior_predict(fit_lognormal, ndraws=4000);
yimp <- sapply(1:N, \(i) {ifelse(x$is_censored[i], sample(yrep[yrep[,i]>x$time[i],i],size=1), x$time[i])})

and then showing plots without imputation and with imputation

ppc_intervals(y=y, yrep=yrep) + scale_y_log10()
ppc_intervals(y=yimp, yrep=yrep) + scale_y_log10()

image
image

Without imputation, the plot doesn't make sense as discussed before. With imputation, the plot makes sense but also reveals there is a group of observation with similar indices (likely to be from the same sub-study).

Imputed y can be used with any plot, even with the `ppc_km_overlay´ just to illustrate that the imputation is not breaking anything (but otherwise, it's better to show km-plot without imputation)

ppc_km_overlay(y=y, yrep=yrep[seq(1,4000,length.out=20),], status_y=1-x$is_censored) + scale_x_log10()
ppc_km_overlay(y=yimp, yrep=yrep[seq(1,4000,length.out=20),], status_y=rep(1,N)) + scale_x_log10()

image
image

bayesplot functions take y and yrep as arguments, and thus the imputation would be made by the user, or by rstanarm or brms. I'm adding this issue, as we could add to bayesplot functions an argument that indicates which elements of y are imputed, and the imputed y values would be shown in a different color to help to see how much the imputation is affecting. This would work also in km and pit_ecdf plots, where the curves would have more and more imputed color when going right.

@TeemuSailynoja
Copy link
Collaborator

TeemuSailynoja commented Feb 6, 2024

Something like this?
image

@avehtari
Copy link
Contributor Author

avehtari commented Feb 7, 2024

Improved plot examples:

  • intervals
    image

  • km_overlay
    image

  • PIT-ECDF
    image

The colors are not specifically the best, but otherwise I like these

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants