SaaS targets: strategy to maintain state and/or other artifacts created #1229
aaronsteers
started this conversation in
General
Replies: 1 comment 1 reply
-
@aaronsteers this seems like a "simpler" problem to solve given that there are so few targets relative to the tap ecosystem.
Wanted to highlight how another company is iterating on their state message spec. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Use cases
There are at least two common use cases where targets would like to append their own artifacts to
STATE
.Scenario 1: Dead letter queues
Scenario 2: Storing the target's own unique key for inserted/updated records.
SaaS targets like Salesforce will generate a unique surrogate ID upon inserting records, but they are not very performant when it comes to merge upserts on the natural/business key. In some cases, the target will have to pay a performance penalty of looking up each record's unique key one at a time before each upsert/insert can be performed.
Scenario 3: "Only once" delivery
The Singer spec mandates that each record be guaranteed to arrive 'at least once', but not 'only once'. Some targets handle the extra record events via a 'merge upsert' but some cannot as easily mitigate duplication issues. For example,
target-apprise
delivers a message per record to systems like Slack or SMS. While some amount of duplication if tolerable, duplicated records as the norm is not acceptable, so it is very important to dedupe those records to the extent it is feasible to do so.Possible implementation via
STATE
For all of the above scenarios, the target could in theory at least, append its own artifacts to
STATE
.Most taps use a convention of
{ "bookmarks": {...} }
for their own state, so to prevent collisions, the target state could adopt a top-level key such astarget_state
, resulting in aSTATE
dictionary that (probably) would look like{ "bookmarks: {...}, "target_state": {...} }
.While this probably would work for the vast, vast number of implementations out there (perhaps all?), it technically violates the Singer Spec condition that targets should treat
STATE
as a passthrough. That doesn't mean it isn't the best solution - only that it's worth a pause to consider if it is worth going against spec, and to consider alternatives if any are available.Alternative implementation via user-configured "target state backend"
All of the same above-mentioned strategies to maintain artifacts in
STATE
could also be applied to a configured backend that is sent as a destination within the target'sconfig.json
file. For instance, theconfig.json
could accept atarget_state_backend_uri
, which could accept a URI similar to Meltano's new state backend feature, except that the artifacts would not pollute the tap's ownSTATE
artifact and results would be sent via API-level CRUD operations instead of sending in the target'sSTDOUT
.Alternative implementation via a special target dataset
Another way to manage this state is via the target's own native storage datasets. So, rather than implementing a config spec that accepts
target_state_backend_uri
, we'd accept table names via settings likedead_letter_queue_table
orsurrogate_key_lookup_table
- which each would hold the table name to store and subsequently retrieve those internally-managed artifacts.This unfortunately only works for targets that have the ability to store arbitrary datasets within them. All SQL-like targets could implement something like this, in theory at least, but targets like
target-apprise
ortarget-miso
don't may not have a traditional 'table' construct where records can be easily stored and retrieved for these use cases. And, as it turns out, those 'app-like' SaaS targets are the ones that will be most likely to need some type of state tracking for dead letter queues, key lookups, message deduplication, etc.Beta Was this translation helpful? Give feedback.
All reactions