SaaS targets: strategy to maintain state and/or other artifacts created #1229

aaronsteers · 2022-11-29T05:54:05Z

aaronsteers
Nov 29, 2022

Use cases

There are at least two common use cases where targets would like to append their own artifacts to STATE.

Scenario 1: Dead letter queues

Error Handling and dead letter queues for targets #133

Scenario 2: Storing the target's own unique key for inserted/updated records.

SaaS targets like Salesforce will generate a unique surrogate ID upon inserting records, but they are not very performant when it comes to merge upserts on the natural/business key. In some cases, the target will have to pay a performance penalty of looking up each record's unique key one at a time before each upsert/insert can be performed.

Scenario 3: "Only once" delivery

The Singer spec mandates that each record be guaranteed to arrive 'at least once', but not 'only once'. Some targets handle the extra record events via a 'merge upsert' but some cannot as easily mitigate duplication issues. For example, target-apprise delivers a message per record to systems like Slack or SMS. While some amount of duplication if tolerable, duplicated records as the norm is not acceptable, so it is very important to dedupe those records to the extent it is feasible to do so.

Possible implementation via `STATE`

For all of the above scenarios, the target could in theory at least, append its own artifacts to STATE.

Most taps use a convention of { "bookmarks": {...} } for their own state, so to prevent collisions, the target state could adopt a top-level key such as target_state, resulting in a STATE dictionary that (probably) would look like { "bookmarks: {...}, "target_state": {...} }.

While this probably would work for the vast, vast number of implementations out there (perhaps all?), it technically violates the Singer Spec condition that targets should treat STATE as a passthrough. That doesn't mean it isn't the best solution - only that it's worth a pause to consider if it is worth going against spec, and to consider alternatives if any are available.

Alternative implementation via user-configured "target state backend"

All of the same above-mentioned strategies to maintain artifacts in STATE could also be applied to a configured backend that is sent as a destination within the target's config.json file. For instance, the config.json could accept a target_state_backend_uri, which could accept a URI similar to Meltano's new state backend feature, except that the artifacts would not pollute the tap's own STATE artifact and results would be sent via API-level CRUD operations instead of sending in the target's STDOUT.

Alternative implementation via a special target dataset

Another way to manage this state is via the target's own native storage datasets. So, rather than implementing a config spec that accepts target_state_backend_uri, we'd accept table names via settings like dead_letter_queue_table or surrogate_key_lookup_table - which each would hold the table name to store and subsequently retrieve those internally-managed artifacts.

This unfortunately only works for targets that have the ability to store arbitrary datasets within them. All SQL-like targets could implement something like this, in theory at least, but targets like target-apprise or target-miso don't may not have a traditional 'table' construct where records can be easily stored and retrieved for these use cases. And, as it turns out, those 'app-like' SaaS targets are the ones that will be most likely to need some type of state tracking for dead letter queues, key lookups, message deduplication, etc.

tayloramurphy · 2022-11-29T16:47:53Z

tayloramurphy
Nov 29, 2022
Maintainer

@aaronsteers this seems like a "simpler" problem to solve given that there are so few targets relative to the tap ecosystem.

Does the "target_state" need to be communicated back to the tap at all? Meaning if a user ran the tap / target combo with meltano would the tap need to read any of the data, or is it just for the target?
- If not, then we could potentially treat this like an extension of the target that can be used if it's based on the SDK. This could be packaged as a performance enhancement of SDK-based targets that aren't necessary for operation, but could be very useful (and supported easily by Meltano Cloud).

Wanted to highlight how another company is iterating on their state message spec.

1 reply

edgarrmondragon Nov 29, 2022
Maintainer

@tayloramurphy That's interesting. It's not far from our our state structure, but we should probably formalize it and document it so it's clear for us and the community where the extension points would be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SaaS targets: strategy to maintain state and/or other artifacts created #1229

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

SaaS targets: strategy to maintain state and/or other artifacts created #1229

aaronsteers Nov 29, 2022

Use cases

Scenario 1: Dead letter queues

Scenario 2: Storing the target's own unique key for inserted/updated records.

Scenario 3: "Only once" delivery

Possible implementation via STATE

Alternative implementation via user-configured "target state backend"

Alternative implementation via a special target dataset

Replies: 1 comment · 1 reply

tayloramurphy Nov 29, 2022 Maintainer

edgarrmondragon Nov 29, 2022 Maintainer

aaronsteers
Nov 29, 2022

Possible implementation via `STATE`

Replies: 1 comment 1 reply

tayloramurphy
Nov 29, 2022
Maintainer

edgarrmondragon Nov 29, 2022
Maintainer