Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adbc_driver_postgres flatten multi dimensional array in Postgres #2063

Open
andrenmo opened this issue Aug 6, 2024 · 3 comments
Open

adbc_driver_postgres flatten multi dimensional array in Postgres #2063

andrenmo opened this issue Aug 6, 2024 · 3 comments
Labels
Type: bug Something isn't working

Comments

@andrenmo
Copy link

andrenmo commented Aug 6, 2024

What happened?

When querying a multi dimensional array in Postgres the returned result "loose" the dimensions, e. g. a 2 dimension array is returned as 1 dimension array, the query:

select array[[1, 1], [2, 3]]

It should return:

[([[1, 1],[2, 3]],)]

It returns:

[([1, 1, 2, 3],)]

Stack Trace

No response

How can we reproduce the bug?

import adbc_driver_postgresql.dbapi

conn_string = 'postgresql://postgres:XXXX@localhost:5432/postgres'
conn = adbc_driver_postgresql.dbapi.connect(conn_string)

cursor = conn.cursor()
query = 'select array[[1, 1], [2, 3]]'
cursor.execute(query)
cursor.fetchall()

Environment/Setup

macOS
PostgreSQL 15
adbc-driver-postgresql 1.1.0

@andrenmo andrenmo added the Type: bug Something isn't working label Aug 6, 2024
@paleolimbot
Copy link
Member

paleolimbot commented Aug 6, 2024

Thank you for opening!

One of the challenges with the Postgres array type is that it can (in theory) have a different number of dimensions for each item, whereas in Arrow land we need to know the number of dimensions in advance. For everything except a single dimension (where Arrow's list type can handle items of different sizes), we would need to know the complete shape in advance (and perhaps return the new tensor extension type).

In the near term we could error when we encounter a number of dimensions that is not exactly one? I don't think we have a mechanism to "warn" or "notice" that we have potentially lost some information.

@andrenmo
Copy link
Author

andrenmo commented Aug 6, 2024

Thanks for the response, in a ideal world this query:

select array[[1, 1], [2, 3]]
union all
select array[[1, 1]]

would return

[([[1, 1], [2, 3]],), ([[1, 1], [None, None]],)]

but an warning (or error that can be silenced with a flag) at least can help to debug why the data is different.

@paleolimbot
Copy link
Member

but an warning (or error that can be silenced with a flag) at least can help to debug why the data is different.

We could probably wire up an error that could be silenced with a statement option today. There is an open issue about warnings/notices which might require a spec change (#1243).

This is another place where the ability to request a schema (#1514) would be helpful as a workaround. It is difficult for the driver to predict this kind of thing without inventing an extension type or implementing SQLite-ish type learning of some kind, but a user might know that they are going to need (e.g.) a 2D array out of a specific column in their result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants