Spec Proposal: Stream Map "v2" syntax with sequenced declaration of transforms rather than hierarchical #1054
aaronsteers
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
I really like the idea! The current structure uses a dictionary/object with a key for each mapped stream, so it wasn't possible to have duplicate configs for a single stream. That's possible with this syntax, so maybe we need the rule that the latest config applies? Or all apply and it's possible to have sequence of transformations in a single stream map run? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The current implementation of stream maps has a very hierarchical view, with dunder (double underscore) operators driving logic for special transformations.
https://sdk.meltano.com/en/latest/stream_maps.html
This dialect has its limitations and can't be implemented easily without having docs handy.
This discussion proposes a new sequential syntax that is more similar in readability transferwise
transform-field
syntax while still retaining the same amount of flexibility of the current SDK implementation.(Scroll below to see the full contents.)
By importing
re
module and pre-defining aregex_replace()
function, we can do advanced renames:We could also pre-define a function called
camel_case()
andsnake_case()
which similar regex as above or some other built-in python methods:Possible list of transform verbs:
With this general pattern, there's a still a lot to tweak in terms of the exact syntax. The above handles most/all of existing capability, while also better handling type transformations, wildcard applications, and operations on subnodes of the record object.
apply-to
- where to apply changes. This can be a stream name, a property, or a subproperty. It can also use wildcards to apply changes to multiple nodes. (Use of regex would be nice, but would require more escaping, and probably would warrant splitting out the stream name part from the property/subproperty part.description
- doesn't do anything except improve readability and provide human readable text that can be printed during the apply statement (for instance, if a transform fails or has bad syntax).new-value
- a formula or static value to apply whenever changing values, or whenever creating new property nodes/subnodes.new-type
- a string representing the new JSON type to apply, if the type is changing or if a new node is being created.add-child-node
- if adding nodes, the name of the node to add, relative toapply-to
. Ifapply-to
is a stream, then a top-level property will be added. Ifapply-to
is a property, then a subproperty will be added.remove-child-node
- the name of the node (or nodes, if a pattern is provided) to remove. Ifapply-to
is a stream name or stream name pattern, then top level node(s) will be removed. Ifapply-to
is a property name or property name pattern, then subnode(s) will be removed relative to that context.new-key-properties
- replaces list of key properties in the stream.new-replication-key
- replaces the replication key.rename-to
- If applied to a stream, would rename the stream; if applied to one or more property nodes, would rename those nodes.Regarding performance
schema
of the stream or streams.new-value
operation) can use expressions that rely on the existing value and the node, schema, and stream context.schema
object, as well as the key properties and stream name, which are all known ahead of time.Backwards compatibility
Since we already accept a
stream_map_config
setting separate from the mainstream_maps
config, we could very likely do this in a backwards compatible way, simply by using the new logic when we see astream_map_config: { version: 2 }
and otherwise use the v1 transformation logic.cc @edgarrmondragon
Beta Was this translation helpful? Give feedback.
All reactions