Get SQL type of a column #393

mgirlich · 2022-08-19T14:31:28Z

It would be useful to somehow get the SQL type of a column, e.g. via dbColumnInfo(). There is also an old issue #78 but this was never really solved.
(By the way this is already possible for Postgres)

This would be useful for the rows_*() methods in dbplyr. E.g. in Postgres I work with array or json(b) columns. As Postgres is pretty strict about types I need explicit casts to the correct type. I guess the same thing would make sense for other databases as well.

The text was updated successfully, but these errors were encountered:

krlmlr · 2022-08-21T13:59:21Z

We can do a bit of this today with dm:::dm_meta_raw() :

library(conflicted)
library(dm)
library(tidyverse)

con <- dm:::financial_db_con()

dm <- dm:::dm_meta_raw(con, NULL)
dm$columns %>%
  count(data_type, character_maximum_length, sort = TRUE)
#> # Source:     SQL [?? x 3]
#> # Database:   mysql  [[email protected]:NA/Financial_ijs]
#> # Groups:     data_type
#> # Ordered by: desc(n)
#>    data_type character_maximum_length       n
#>    <chr>                      <int64> <int64>
#>  1 int                             NA    7137
#>  2 varchar                        255    1964
#>  3 decimal                         NA    1326
#>  4 bigint                          NA    1205
#>  5 varchar                         64     535
#>  6 timestamp                       NA     488
#>  7 float                           NA     470
#>  8 datetime                        NA     364
#>  9 double                          NA     343
#> 10 date                            NA     340
#> # … with more rows

^{Created on 2022-08-21 by the reprex package (v2.0.1)}

The translation from data type to R vector is a separate issue, @nbenn might have a few ideas.

Would you like to share a reprex of your current RPostgres workflow, and where it is lacking?

mgirlich · 2022-08-22T06:49:30Z

Hera a not so short reprex

con <- DBI::dbConnect(RPostgres::Postgres())

# db setup
DBI::dbExecute(con, "CREATE TYPE dbi_enum AS ENUM ('a', 'b');")
#> [1] 0
DBI::dbExecute(con, "CREATE SEQUENCE IF NOT EXISTS dbi_test_sequence;")
#> [1] 0
DBI::dbExecute(
  con,
  "CREATE TABLE dbi_test (
    id bigint DEFAULT nextval('dbi_test_sequence'::regclass) PRIMARY KEY,
    enum_col dbi_enum,
    jsonb_col jsonb,
    array_col text[]
  );"
)
#> [1] 0

# append table
df <- tibble::tibble(
  enum_col = "a",
  jsonb_col = '{"a": 1, "b": [1, 2]}',
  array_col = '{"a", "b"}'
)

# praise DBI/RPostgres that this works nicely :-)
DBI::dbAppendTable(
  con,
  "dbi_test",
  df,
  copy = FALSE
)
#> [1] 1

# blindly copy `df` to the db ...
df_db <- dplyr::copy_to(con, df, "df_db")
# ... and appending fails :-(
dplyr::rows_append(
  dplyr::tbl(con, "dbi_test"),
  df_db
)
#> Error: Failed to prepare query: ERROR:  UNION types dbi_enum and text cannot be matched
#> LINE 9:     SELECT NULL AS "id", *
#>                                  ^

# we need to setup the table with explicit types
types <- c(
  enum_col = "dbi_enum",
  jsonb_col = "jsonb",
  array_col = "text[]"
)
df_db2 <- dplyr::copy_to(con, df, "df_db2", types = types, overwrite = TRUE)

dplyr::rows_append(
  dplyr::tbl(con, "dbi_test"),
  df_db2,
  in_place = TRUE
)

# get the types
# 1. attempt: use `dbDataType()`
dbi_test_local <- DBI::dbReadTable(con, "dbi_test")
# nope
DBI::dbDataType(con, dbi_test_local)
#>        id  enum_col jsonb_col array_col 
#>  "BIGINT"    "TEXT"    "TEXT"    "TEXT"

# 2. attemt: use `dbColumnInfo()`
# table name does not work :-(
DBI::dbColumnInfo(con, "dbi_test")
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'dbColumnInfo' for signature '"PqConnection"'

res <- DBI::dbSendQuery(con, "SELECT * FROM dbi_test WHERE 0 = 1;")
DBI::dbColumnInfo(res)
#>        name      type   .oid .known .typname
#> 1        id    double     20   TRUE     int8
#> 2  enum_col character 397439  FALSE     <NA>
#> 3 jsonb_col character   3802  FALSE    jsonb
#> 4 array_col character   1009  FALSE    _text

# the `type` column is standard in DBI but contains the R type
# the `.typname` column is RPostgres specific and contains what I want

# clean up
DBI::dbExecute(con, "DROP TABLE dbi_test;")
#> Warning in result_create(conn@ptr, statement, immediate): Closing open result
#> set, cancelling previous query
#> [1] 0
DBI::dbExecute(con, "DROP SEQUENCE dbi_test_sequence;")
#> [1] 0
DBI::dbExecute(con, "DROP TYPE dbi_enum CASCADE;")
#> NOTICE:  drop cascades to column enum_col of table df_db2
#> [1] 0

^{Created on 2022-08-22 with reprex v2.0.2}

My goal for dbplyr:

programmatically get the SQL column types of a table
add a types argument to copy_inline()
allow types of copy_to() and copy_inline() to accept another tbl object to learn the types from
in rows_*() cast the columns of y to the types of x

Currently, there is no easy way to do this with DBI: dbColumnInfo() only gives me the SQL type for Postgres. And there it is not even complete (see my reprex and the lack of support for enums).

I had a look at dm_meta_raw() which looks nice. Is information_schema.columns SQL standard? And at least for Postgres data_type is not sufficient but we need udt_name (due custom types like my enum above).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get SQL type of a column #393

Get SQL type of a column #393

mgirlich commented Aug 19, 2022

krlmlr commented Aug 21, 2022

mgirlich commented Aug 22, 2022

Get SQL type of a column #393

Get SQL type of a column #393

Comments

mgirlich commented Aug 19, 2022

krlmlr commented Aug 21, 2022

mgirlich commented Aug 22, 2022