Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get SQL type of a column #393

Open
mgirlich opened this issue Aug 19, 2022 · 2 comments
Open

Get SQL type of a column #393

mgirlich opened this issue Aug 19, 2022 · 2 comments

Comments

@mgirlich
Copy link

It would be useful to somehow get the SQL type of a column, e.g. via dbColumnInfo(). There is also an old issue #78 but this was never really solved.
(By the way this is already possible for Postgres)

This would be useful for the rows_*() methods in dbplyr. E.g. in Postgres I work with array or json(b) columns. As Postgres is pretty strict about types I need explicit casts to the correct type. I guess the same thing would make sense for other databases as well.

@krlmlr
Copy link
Member

krlmlr commented Aug 21, 2022

We can do a bit of this today with dm:::dm_meta_raw() :

library(conflicted)
library(dm)
library(tidyverse)

con <- dm:::financial_db_con()

dm <- dm:::dm_meta_raw(con, NULL)
dm$columns %>%
  count(data_type, character_maximum_length, sort = TRUE)
#> # Source:     SQL [?? x 3]
#> # Database:   mysql  [[email protected]:NA/Financial_ijs]
#> # Groups:     data_type
#> # Ordered by: desc(n)
#>    data_type character_maximum_length       n
#>    <chr>                      <int64> <int64>
#>  1 int                             NA    7137
#>  2 varchar                        255    1964
#>  3 decimal                         NA    1326
#>  4 bigint                          NA    1205
#>  5 varchar                         64     535
#>  6 timestamp                       NA     488
#>  7 float                           NA     470
#>  8 datetime                        NA     364
#>  9 double                          NA     343
#> 10 date                            NA     340
#> # … with more rows

Created on 2022-08-21 by the reprex package (v2.0.1)

The translation from data type to R vector is a separate issue, @nbenn might have a few ideas.

Would you like to share a reprex of your current RPostgres workflow, and where it is lacking?

@mgirlich
Copy link
Author

Hera a not so short reprex

con <- DBI::dbConnect(RPostgres::Postgres())

# db setup
DBI::dbExecute(con, "CREATE TYPE dbi_enum AS ENUM ('a', 'b');")
#> [1] 0
DBI::dbExecute(con, "CREATE SEQUENCE IF NOT EXISTS dbi_test_sequence;")
#> [1] 0
DBI::dbExecute(
  con,
  "CREATE TABLE dbi_test (
    id bigint DEFAULT nextval('dbi_test_sequence'::regclass) PRIMARY KEY,
    enum_col dbi_enum,
    jsonb_col jsonb,
    array_col text[]
  );"
)
#> [1] 0

# append table
df <- tibble::tibble(
  enum_col = "a",
  jsonb_col = '{"a": 1, "b": [1, 2]}',
  array_col = '{"a", "b"}'
)

# praise DBI/RPostgres that this works nicely :-)
DBI::dbAppendTable(
  con,
  "dbi_test",
  df,
  copy = FALSE
)
#> [1] 1

# blindly copy `df` to the db ...
df_db <- dplyr::copy_to(con, df, "df_db")
# ... and appending fails :-(
dplyr::rows_append(
  dplyr::tbl(con, "dbi_test"),
  df_db
)
#> Error: Failed to prepare query: ERROR:  UNION types dbi_enum and text cannot be matched
#> LINE 9:     SELECT NULL AS "id", *
#>                                  ^

# we need to setup the table with explicit types
types <- c(
  enum_col = "dbi_enum",
  jsonb_col = "jsonb",
  array_col = "text[]"
)
df_db2 <- dplyr::copy_to(con, df, "df_db2", types = types, overwrite = TRUE)

dplyr::rows_append(
  dplyr::tbl(con, "dbi_test"),
  df_db2,
  in_place = TRUE
)

# get the types
# 1. attempt: use `dbDataType()`
dbi_test_local <- DBI::dbReadTable(con, "dbi_test")
# nope
DBI::dbDataType(con, dbi_test_local)
#>        id  enum_col jsonb_col array_col 
#>  "BIGINT"    "TEXT"    "TEXT"    "TEXT"

# 2. attemt: use `dbColumnInfo()`
# table name does not work :-(
DBI::dbColumnInfo(con, "dbi_test")
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'dbColumnInfo' for signature '"PqConnection"'

res <- DBI::dbSendQuery(con, "SELECT * FROM dbi_test WHERE 0 = 1;")
DBI::dbColumnInfo(res)
#>        name      type   .oid .known .typname
#> 1        id    double     20   TRUE     int8
#> 2  enum_col character 397439  FALSE     <NA>
#> 3 jsonb_col character   3802  FALSE    jsonb
#> 4 array_col character   1009  FALSE    _text

# the `type` column is standard in DBI but contains the R type
# the `.typname` column is RPostgres specific and contains what I want

# clean up
DBI::dbExecute(con, "DROP TABLE dbi_test;")
#> Warning in result_create(conn@ptr, statement, immediate): Closing open result
#> set, cancelling previous query
#> [1] 0
DBI::dbExecute(con, "DROP SEQUENCE dbi_test_sequence;")
#> [1] 0
DBI::dbExecute(con, "DROP TYPE dbi_enum CASCADE;")
#> NOTICE:  drop cascades to column enum_col of table df_db2
#> [1] 0

Created on 2022-08-22 with reprex v2.0.2

My goal for dbplyr:

  • programmatically get the SQL column types of a table
  • add a types argument to copy_inline()
  • allow types of copy_to() and copy_inline() to accept another tbl object to learn the types from
  • in rows_*() cast the columns of y to the types of x

Currently, there is no easy way to do this with DBI: dbColumnInfo() only gives me the SQL type for Postgres. And there it is not even complete (see my reprex and the lack of support for enums).

I had a look at dm_meta_raw() which looks nice. Is information_schema.columns SQL standard? And at least for Postgres data_type is not sufficient but we need udt_name (due custom types like my enum above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants