Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[epic] Support for large queries from data API #19

Open
7 tasks
rufuspollock opened this issue Oct 20, 2020 · 1 comment
Open
7 tasks

[epic] Support for large queries from data API #19

rufuspollock opened this issue Oct 20, 2020 · 1 comment

Comments

@rufuspollock
Copy link
Member

rufuspollock commented Oct 20, 2020

When querying the data API I want to be able to make queries and get results with 100k or 1m+ results and download them so that I can extract the data I want even if larger

Acceptance

  • Design the solution
    • Consider authorization considerations
  • Write to storage approach
    • Choose storage backend
    • Stream to storage from data API (or trigger background job)
    • Return download URL
  • Setup switch from "streaming" to write to storage

Analysis

There are 2 approaches:

  1. Stream the whole result
  2. Extract the query results to storage and give storage url to the user

One can also have hybrid e.g. do the former up to some number of results and then switch to 2.

There are several advantages of option 2:

  • You have a natural cache structure on disk so that the same query may not need to be recomputed (you can expire exported results after some time period)
  • If your download/stream is interrupted you can resume it from storage (rather than re-running the query)
  • You give the end user the option to share the file for a certain period of time with other users

The disadvantages (at least the small data sizes):

  • It's more complex / more work on backend
  • Slower (greater latency)
  • More complex for user: they have 2 steps where there was one (get query result, extract download url and download)
@rufuspollock rufuspollock changed the title # [epic] Support for large queries from data API [epic] Support for large queries from data API Oct 20, 2020
@leomrocha
Copy link
Contributor

leomrocha commented Nov 24, 2020

@rufuspollock
The current implementation Streams the results, we are testing the performance in some testing (but distributed) environments.

Tests wen't right and now it seems that any extra optimization needs to be in the Hasura and Postgres queries and views.

For further improvements, the current issue is the next step, but due to time limitations we'll not start with this for the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants