You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When querying the data API I want to be able to make queries and get results with 100k or 1m+ results and download them so that I can extract the data I want even if larger
Acceptance
Design the solution
Consider authorization considerations
Write to storage approach
Choose storage backend
Stream to storage from data API (or trigger background job)
Return download URL
Setup switch from "streaming" to write to storage
Analysis
There are 2 approaches:
Stream the whole result
Extract the query results to storage and give storage url to the user
One can also have hybrid e.g. do the former up to some number of results and then switch to 2.
There are several advantages of option 2:
You have a natural cache structure on disk so that the same query may not need to be recomputed (you can expire exported results after some time period)
If your download/stream is interrupted you can resume it from storage (rather than re-running the query)
You give the end user the option to share the file for a certain period of time with other users
The disadvantages (at least the small data sizes):
It's more complex / more work on backend
Slower (greater latency)
More complex for user: they have 2 steps where there was one (get query result, extract download url and download)
The text was updated successfully, but these errors were encountered:
rufuspollock
changed the title
# [epic] Support for large queries from data API
[epic] Support for large queries from data API
Oct 20, 2020
When querying the data API I want to be able to make queries and get results with 100k or 1m+ results and download them so that I can extract the data I want even if larger
Acceptance
Analysis
There are 2 approaches:
One can also have hybrid e.g. do the former up to some number of results and then switch to 2.
There are several advantages of option 2:
The disadvantages (at least the small data sizes):
The text was updated successfully, but these errors were encountered: