Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: refactor dbListTables() et al. #413

Merged
merged 1 commit into from
Apr 1, 2024

Conversation

dpprdan
Copy link
Contributor

@dpprdan dpprdan commented Nov 29, 2022

I would like to take another stab at #251 and this some preparatory work for that.

Background

Materialized views are currently not returned by dbListTables() et al (#251). Materialized Views are not included in INFORMATION_SCHEMA.tables (see this thread for why they are not. Tl;dr: "They are not defined by the SQL standard.")

The information schema consists of a set of views that contain information about the objects defined in the current database.
The information schema is defined in the SQL standard and can therefore be expected to be portable and remain stable — unlike the system catalogs, which are specific to PostgreSQL and are modeled after implementation concerns.
The information schema views do not, however, contain information about PostgreSQL-specific features; to inquire about those you need to query the system catalogs or other PostgreSQL-specific views.
https://www.postgresql.org/docs/current/information-schema.html (emphasis mine).

Sidenote: Most if not all tables in information_schema are just views of the system catalogs, see e.g.

SELECT definition FROM pg_views 
WHERE schemaname = 'information_schema' AND viewname = 'tables';

Since mviews are PostgreSQL-specific features we need to query the system catalogs. In particular, we need to use pg_class/pg_namespace instead of INFORMATION_SCHEMA.tables.

However, in order to retain support for Redshift, we need to keep the INFORMATION_SCHEMA queries as well.

What happens in this PR?

The above affects the queries that underly the dbListTables(), dbExistsTable(), dbListObjects(), and dbListFields() functions.

dbListTables(), dbExistsTable() and dbListObjects() do essentially the same thing, namely call INFORMATION_SCHEMA.tables.
In order to make the abovementioned adjustments easier and keep duplicated code at a minimum, I refactored those functions to use a common core function list_tables(), which returns the SQL code to query INFORMATION_SCHEMA.tables. The (alternative) queries to pg_class (to be added in another PR) will then live here.

I refactored the find_table() and list_fields() functions that underly dbListFields() in a similar fashion.

Details

dbListFields(): find the first table on the search path

SELECT column_name FROM (
  SELECT *, rank() OVER (ORDER BY nr) AS rnr FROM (
    SELECT nr, schemas[nr] AS table_schema FROM (
      SELECT *, generate_subscripts(schemas, 1) AS nr FROM (
        SELECT current_schemas(true) AS schemas) t
      ) tt WHERE schemas[nr] <> 'pg_catalog'
    ) ttt 
  INNER JOIN INFORMATION_SCHEMA.columns USING (table_schema) WHERE table_name = [table_name]
) tttt 
WHERE rnr = 1 ORDER BY ordinal_position

can be re-written as (with 2 instead of 4 nested queries)

SELECT column_name FROM (
  SELECT *, rank() OVER (ORDER BY schema_nr) AS schema_rank FROM (
    SELECT * FROM unnest(current_schemas(true)) WITH ORDINALITY AS "tbl"("table_schema","schema_nr") 
    WHERE "table_schema" != 'pg_catalog') t 
  INNER JOIN INFORMATION_SCHEMA.columns USING (table_schema) 
  WHERE table_name = [table_name]) tt 
WHERE schema_rank = 1 
ORDER BY ordinal_position

Essentially I applied similar query code here as in #261, but calling information_schema instead of the system catalogs.
I will open another PR to incorporate the queries to the system catalogs.

It is probably easiest to review this by going through the individual commits.

I do not have access to Redshift, so I was not able to test this PR there. There might be small adjustments necessary (see below). Is it somehow possible for me to test on Redshift (e.g. via GHA)? Or how does this work in general?

Related issues

#251
#261

This does not address #388. I think it is best to address this after #251 is resolved.

#390 is also related, but relatively trivial to apply before or after this.

@@ -156,25 +206,31 @@ find_table <- function(conn, id, inf_table = "tables", only_first = FALSE) {
)
only_first <- FALSE
Copy link
Contributor Author

@dpprdan dpprdan Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This override was already present in find_table() (but find_table() was generally called with only_first = TRUE by list_fields()).

I think this is a remainder of the first two commits in #326, where indeed only_first isn't necessary, because that used current_schema() (not "schema_s_").

It should be only_first = TRUE now though because there can be multiple schemas on the search path on Redshift as well (I think).

@krlmlr
Copy link
Member

krlmlr commented Mar 16, 2023

Conflicts here, too.

@dpprdan dpprdan changed the title refactor dbListTables() et al. chore: refactor dbListTables() et al. Mar 16, 2023
Copy link
Contributor

aviator-app bot commented Nov 9, 2023

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged using Aviator.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@krlmlr krlmlr force-pushed the feat/refactor_list branch 3 times, most recently from 607d2df to 356b790 Compare April 1, 2024 13:57
@krlmlr
Copy link
Member

krlmlr commented Apr 1, 2024

Thanks!

Original: 7730f13

refactor `dbListTables()` with `list_tables()`, now orders result by `table_type` and `table_name`
refactor `dbExistsTable()` with `list_tables()`
refactor `dbListObjects()` with `list_tables()`
merge `find_table()` code into `list_fields()`
`find_table()` isn't used anywhere else anymore (e.g. `exists_table()`)
simplify the "get current_schemas() as table" code
pass full `id` to `list_fields()`
align `dbExistsTable()` with `dbListFields()`
simplify `where_schema` in `list_tables()`
align `where_table` with `where_schema` in `list_tables()`
@aviator-app aviator-app bot merged commit ff4b0e8 into r-dbi:main Apr 1, 2024
17 of 18 checks passed
@dpprdan dpprdan deleted the feat/refactor_list branch April 3, 2024 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants