Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superset-generated time range causes SQL error: ParserError("Expected end of statement, found: 00") #11

Open
jdstrand opened this issue Feb 7, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@jdstrand
Copy link
Contributor

jdstrand commented Feb 7, 2023

Superset has the ability to set a time range in its Chart Graph functionality. When set, the resulting SQL has a WHERE clause of this form:

WHERE time >= 2023-01-08 00:00:00.000000
  AND time < 2023-01-15 00:00:00.000000

This results in: executing query: failed to create Flight record reader: arrow/flight: could not create flight reader: arrow/ipc: could not read schema from stream: arrow/ipc: could not read message schema: rpc error: code = InvalidArgument desc = Error while planning query: SQL error: ParserError("Expected end of statement, found: 00").

A simple reproducer that generates an error without superset is:

SELECT * FROM "some_table"
WHERE time >= 2023-01-08 00:00:00.000000
  AND time < 2023-01-15 00:00:00.000000
LIMIT 10;

We can make the error go away if we quote the date strings:

SELECT * FROM "some_table"
WHERE time >= '2023-01-08 00:00:00.000000'
  AND time < '2023-01-15 00:00:00.000000'
LIMIT 10;

https://arrow.apache.org/datafusion/user-guide/sql/select.html#where-clause is pretty lean on details, but AIUI, it is using sqlparser which is ANSI SQL:2011. I tried to find the grammar for that but apparently it is behind a paywall. Postgres is mostly ANSI SQL:2016 compliant and section 8.5.1.3 of https://www.postgresql.org/docs/15/datatype-datetime.html speaks of TIMESTAMP '2004-10-19 10:23:54' being a valid SQL standard timestamp.

As such, this is likely most correct (ie, flightsql-dbapi likely should generate this form of WHERE):

SELECT * FROM "some_table"
WHERE time >= timestamp '2023-01-08 00:00:00.000000'
  AND time < timestamp '2023-01-15 00:00:00.000000'
LIMIT 10;
@brettbuddin brettbuddin self-assigned this Feb 8, 2023
@brettbuddin brettbuddin added the bug Something isn't working label Feb 8, 2023
@brettbuddin
Copy link
Contributor

brettbuddin commented Feb 8, 2023

@jdstrand I took a look and I'm not able to reproduce the issue on my side. I added some logging to print out the query handed to the DB API layer. Here's what my Superset UI shows:

Screenshot 2023-02-07 at 8 12 43 PM

Screenshot 2023-02-07 at 8 15 33 PM

Here are the queries its producing:

superset_app          | EXECUTE SELECT time AS __timestamp,
superset_app          |                AVG(co) AS "AVG(co)"
superset_app          | FROM iox."airSensors"
superset_app          | WHERE time >= '2023-02-07 01:11:41.000000'
superset_app          |   AND time < '2023-02-08 01:11:41.000000'
superset_app          | GROUP BY time
superset_app          | ORDER BY "AVG(co)" DESC
superset_app          | LIMIT 10000 None
superset_app          | EXECUTE SELECT co AS co
superset_app          | FROM iox."airSensors"
superset_app          | WHERE time >= '2023-02-08 00:00:00.000000'
superset_app          |   AND time < '2023-02-08 00:00:00.000000'
superset_app          | GROUP BY co
superset_app          | LIMIT 1000 None

The trailing None is just the parameter list passed to the execute method in the DB API interface and can be ignored.

You can see it has single-quoted the datetimes in my case. I've tried a few variants including the custom, relative and "advanced" ranges. What am I missing?

@jdstrand
Copy link
Contributor Author

jdstrand commented Feb 8, 2023

Reproducer from within Superset:

  1. follow https://docs.influxdata.com/influxdb/cloud-iox/visualize-data/superset/. My checkout is superset 9dfaad772d2ac42c35519a55e7b3df27ca03b072
  2. connect to an IOx database/bucket/namespace using datafusion+flightsql://<CLOUD>.cloud2.influxdata.com:443?bucket-name=BUCKETNAME&token=TOKEN (I chose an internal Multi-Tenant Cloud cluster, but presumably a public one would work too)
  3. go to the SQL lab and enter this (assumes that "some_table" has some fields, a timestamp column, etc):
    SELECT * FROM "some_table"
    LIMIT 1000
    
  4. Click Run and observe returned rows
  5. Click 'Create Chart'
  6. Create the chart:
    • Select 'Time Series Area' chart
    • Time Column: time (or whatever your timestamp column is)
    • Query:
      • Metrics: select a column and choose COUNT as aggregate
      • Dimensions: select another column
  7. Click 'Update Chart'. It should display something
  8. Click the 'Time Range' drop down, choose Custom and create a time range (under 'Actual' mine shows 2022-11-30 ≤ col < 2023-02-08). Click Apply
  9. Click 'Update Chart' and observe the error: Error: Flight returned invalid argument error, with message: Error while planning query: SQL error: ParserError("Expected end of statement, found: 00"). gRPC client debug context: UNKNOWN:Error received from peer ipv4:<redacted>:443 {grpc_message:"Error while planning query: SQL error: ParserError(\"Expected end of statement, found: 00\")", grpc_status:3, created_time:"2023-02-08T15:14:42.395137567+00:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
  10. Click the ... icon in the upper right and click View Query. It shows unquoted times:
    WHERE time >= 2022-11-30 00:00:00.000000
      AND time < 2023-02-08 00:00:00.000000
    

Modifying step 8 to use 'No filter' makes it work again. Using 'Last', 'Previous' or 'Advanced' all causes the error. I also mimicked your use of custom with 'Start Relative Date/Time: 1 days before' and 'End: Now' to have the equivalent hover over in superset and still see the same behavior.

Updating step 6 to use 'Time Series Bar', 'Time Series Line', or 'Time Series Table' all results in the same error.

Wondering if it has something to do with the data here is an updated query on real data:

SELECT action, actor, time FROM "gh-audit"
WHERE actor = 'dependabot[bot]'
LIMIT 4

Clicking Run and downloading the CSV:

action,actor,time
pull_request.create,dependabot[bot],2023-01-24 13:31:48.795000
pull_request.create,dependabot[bot],2023-01-24 09:57:10.684000
pull_request.create,dependabot[bot],2023-01-24 01:06:51.266000
pull_request.create,dependabot[bot],2023-01-24 01:08:43.016000

If I start with step '5' above, I can still reproduce (Metrics is COUNT(action) and Dimension is actor). Presumably, this could be imported as a table in a bucket/namespace you own.

@brettbuddin
Copy link
Contributor

I'll review this today. Thanks for the thorough instructions.

@brettbuddin
Copy link
Contributor

I was able to reproduce this issue as @jdstrand describes above. The difference between the way he entered the chart builder is that he's coming in via the SQL Lab. The SQL Lab flow allows you to build a query that can then be used as a virtual table (via a sub-select). Here's a screenshot comparing the output SQL for both cases:

Screenshot 2023-02-08 at 3 05 32 PM

I'm unsure what the difference is in how SQLAlchemy is involved—or if it's using a completely different way of generating queries if querying via the virtual table. Looking into that now.

@brettbuddin
Copy link
Contributor

Even weirder, @jdstrand: If you create a Dataset from the SQL Lab query and go through the same flow, you arrive at this:

Screenshot 2023-02-08 at 3 17 07 PM

So if it's coming from a Dataset at all—whether it's the abstract table or a specific select statement—it is able to quote the dates correctly. However, if you come in from the SQL Lab and don't attempt to persist the query into a Dataset record, it doesn't single-quote them.

@brettbuddin brettbuddin removed their assignment Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants