Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slice_sample() does not interact correctly with set.seed() when working with an SQLite database #466

Open
stephenashton-dhsc opened this issue Aug 7, 2023 · 1 comment

Comments

@stephenashton-dhsc
Copy link

I'm not too sure if this issue sits within dbplyr or RSQLite, but the slice_sample() function does not appear to be taking account of a seed set via set.seed() when using an SQLite database.

If I set a seed, then run slice_sample(), then later set the same seed, and run slice_sample() again, the sampled table is different.

For comparison, when undertaking these sample operations using a Microsoft SQL Server database, the expected behaviour is observed (i.e., the two sampled tables are identical).

library(DBI)
library(RSQLite)
library(dplyr)
library(dbplyr)


tbl <- tibble(n = 1:30)
conn <- dbConnect(SQLite(), ":memory:")
dbWriteTable(conn, "test_table", tbl)

set.seed(100)
sample1 <- slice_sample(tbl(conn, "test_table"), n = 3)

set.seed(100)
sample2 <- slice_sample(tbl(conn, "test_table"), n = 3)

identical(sample1, sample2)

This has been raised on dbplyr as tidyverse/dbplyr#1344

@krlmlr
Copy link
Member

krlmlr commented Aug 7, 2023

options(conflicts.policy = list(warn = FALSE))

library(DBI)
library(RSQLite)
library(dplyr)
library(dbplyr)


tbl <- tibble(n = 1:30)
conn <- dbConnect(SQLite(), ":memory:")
dbWriteTable(conn, "test_table", tbl)

set.seed(100)
slice_sample(tbl(conn, "test_table"), n = 3) |> show_query()
#> <SQL>
#> SELECT `n`
#> FROM (
#>   SELECT *, ROW_NUMBER() OVER (ORDER BY RANDOM()) AS `q02`
#>   FROM `test_table`
#> )
#> WHERE (`q02` <= 3)

Created on 2023-08-07 with reprex v2.0.2

We'd need a deterministic version of RANDOM(), which doesn't seem to exist at the time of writing. We'll need either an SQLite extension or a patch in SQLite itself.

https://stackoverflow.com/a/24394275/946850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants