Tap and Target performance benchmarks #906
Replies: 4 comments 8 replies
-
This benchmark data was provided by @BuzzCutNorman using Test: Full Stack Overflow Table Start DTTM End DTTM Columns Rows Duration/Minutes
Badges 08-10 18:36 08-10 19:07 4 8042005 31
Comments 08-10 19:07 08-10 21:55 6 24534730 168
LinkTypes 08-10 21:55 08-10 21:55 2 2 0
PostLinks 08-10 21:55 08-10 22:01 5 1421208 6
PostTypes 08-10 22:01 08-10 22:01 2 8 0
Posts 08-10 22:01 08-11 00:20 20 17142169 139
Users 08-11 00:20 08-11 00:34 14 2465713 14
UsersTest 08-11 00:34 08-11 00:34 14 5000 0
VoteTypes 08-11 00:34 08-11 00:34 1 15 0
Votes 08-11 00:34 08-11 05:35 6 52928720 301 |
Beta Was this translation helpful? Give feedback.
-
SummaryI have been trying out some SQLAlchemy options to see how they effect extract and load times and thought I would share what I have come across so far. The biggest boost came from the target having executemany enabled. SQLAlchemy I think utilizes executemany_mode=values_only for postgres by default so I had to force executemany off to test how big a benefit it and values_plus_batch had. The benefit was a 10 min reduction in extract and load times. Next up is trying some of the SQLAlchemy option against the larger tables in the StackOverflow2013 database. I will call out one item not mentioned below that I didn't notice until late in my testing. The SDK by default turns on streaming results. SetupSource:VM 4 x vCPU, 16 GB memory Target:VM 4 x vCPU 16 GB memory Meltano:VM 2 x vCPU, 8 GB memory Results
Legend: Other Links: |
Beta Was this translation helpful? Give feedback.
-
Would getting good metrics logging in the SDK help with this? I imagine it would at least help to compare record-by-record vs batch by looking at a record count timeseries in e.g. Prometheus. For example, backpressure would become apparent. |
Beta Was this translation helpful? Give feedback.
-
Based on the new (still in progress)
Environment and project details
Findings and CaveatsFindings
Caveats
Next stepsThese performance metrics are only for extraction to local disk. As a next step, we would perform a similar test to analyze load performance back into Snowflake, with and without batching enabled. |
Beta Was this translation helpful? Give feedback.
-
I'm opening this discussion to share general performance benchmarks. We can use these as case studies and reference points as we work on SDK-level performance improvements.
Beta Was this translation helpful? Give feedback.
All reactions