Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a step to be both a connector and a flow step and dynamically change depending on config #422

Open
3 tasks
brendanheywood opened this issue Aug 4, 2022 · 4 comments · May be fixed by #473
Open
3 tasks
Assignees
Labels
Milestone

Comments

@brendanheywood
Copy link
Contributor

eg a curl step could be a connector and fire once if its upstream is a connector output. But if wired to a flow then it will fire once per row.

  • Any steps which can be flows will be treated as flow steps in the step chooser
  • A step can only be one or the other for any given config, ie it could change based on the upstream connectors
  • or it could have its own config which declares which mode it is in
@brendanheywood
Copy link
Contributor Author

See comments in #377

@brendanheywood
Copy link
Contributor Author

Was mulling on this and thinking if things could be radically simplified in order to achieve this.

Consider the situation if all connector steps accepted and / or produced iterators just like flow steps. But their iterator would only ever have on record in it. Functional that would be identical to how they are now.

A trigger step could produce a stream of events or just one.

But then consider how this works if you compose for example two reader steps, both which previously would start with a connector input and product a flow output. In the new scenario they would both start a flow of 1 and then when given a singleton record they produce N & M records from that single one. Which means if we stack then end to end then it's effectively like a cartesian product where now the resulting iterator has N x M records in it. That example may not be practical but it has a powerful elegance which appeals. Now almost all actions are both with essentially no code effort.

If you have a flow and you want to convert that to a connector then in this scenario I think you would need to have explicit flow caps and all they do is listen to the iterator and wait til they are drained and then they kick of a new iterator which only has the one record in it. All of the complexity around flowlet subgraphs disappears. The two state diagrams for flows vs connector converge into one. I even think this simplifies the way triggers could work

If you have a trigger that starts with a flow you can easily have steps which split off a side iterator which only fires once and it could do it with the first record or do it the last just as easily as it could by doing odds or evens.

If we do migrate to this conceptually I think we can do it all under the hood with very little public api changes. The step classes should not change for most of the steps. I'd everyone to really critique this concept please.

@keevan
Copy link
Contributor

keevan commented Aug 6, 2022

But then consider how this works if you compose for example two reader steps

I've always had a mental model that flow groups worked like for-loops, so this would be like a nested for-loop and potentially you could use the result from the outer loop to determine what to query for the inner loop (in the case of 2 readers). It might not make sense for 2 SQL readers, but maybe if the outer one came from some external JSON and the inner one was an SQL query which might produce 0 or more rows.

I think we can do it all under the hood with very little public api changes.

We probably can do it without changing much of how the flow is authored. I do think there are a number of things that should change in the public api (or certain aspects of it) but these can be done separately.

One in particular I'd like to have happen is this recently logged issue #468

I think all current connector steps should have the potential to produce an iterator just like flow steps. Engine wise, I am not sure if they need to all be converted to flow like iterators, at least initially. I think the mechanics for configuring whether something flows or not (produces an iterable) once set in stone would still allow it to work with the existing - albeit complex - architecture. Then we can refine the internals without having to worry about that aspect once we hit it, but can still enjoy being able to use both current connectors inside flow steps and flow steps as connectors interchangeably sooner.

So if I were to put this down into some order of tasks (high level), it could be:

  • allow steps to change whether they are part of a flow group or not. This will be like the first stage to see whether or not it will work in practice and we can determine any outliers (e.g. writers) and what we can do about them (perhaps a writer "connector" will have an option for us to configure the output it will write, much like the email body of the email step, and in the absense of a body, it will use the "record" as-is). There is also nothing wrong with skipping the default "record" option to write a custom body per "record" if one was desired (which is currently not possible).
  • clean up the usage, api and steps as much as possible (or split them into new issues) when reviewing and changing them to be compatible for both types of input connections.
  • determine if we want explicit flow caps or not. Aside from very simple flows, I think we do want them. But the mechanics behind how they are added / configured could already be determined from the configurations added earlier to support both flow/connector-esque steps.
  • once everything works close to what is expected, then change the internals of how flows and connectors are handled and un-complicate the current architecture (which is working), which should open up more options for different types of flows as mentioned in your comment earlier.

This is more or less of a thought dump and it might be easier to see where to go once the steps are taken, since they need to be taken at one point or another regardless.

@keevan
Copy link
Contributor

keevan commented Aug 8, 2022

Probably somewhere in this mix, it would be good to clean up the step names (lang string) so they no longer reference connector / flow / logic etc as per the comment here: #503 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants