Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Cap on total number of processes on a worker #61

Open
galenatjpl opened this issue Oct 5, 2021 · 11 comments
Open

Implement Cap on total number of processes on a worker #61

galenatjpl opened this issue Oct 5, 2021 · 11 comments
Milestone

Comments

@galenatjpl
Copy link
Contributor

galenatjpl commented Oct 5, 2021

Ceiling of number of RUNNING processes in total on a worker. The purpose would be to have a predictable limit/cap on the CPU and/or memory usage on a worker, at any given time.

So for example, if the individual BPMN limits are:

BPMN A: 5
BPMN B: 3
BPMN C: 2

and there is a cap:

CAP = 4

You could, for example have 1 A running, 2 B running, 1 C running (totals to 4), at most.

CURRENT WORKAROUND:

  • limit a worker to a single BPMN type
  • use a smaller/appropriately-sized worker instance class
@eamonford
Copy link

Implementing a configurable cap on the total number of processes that can run concurrently on a worker, regardless of what BPMNs are assigned to that worker, would allow us to more finely-tune the ASGs such that:

a) ASGs would use EC2 types that only have enough resources to run a single BPMN at a time
b) ASGs would have a much higher max, for greater worker-level parallelism

Being able to restrict workers to run only one BPMN at a time would allow us to still assign multiple BPMNs to an ASG (without the risk of multiple concurrent BPMNs crashing the worker), which would mean an ASG can be reused for many different types of BPMNs that have similar resource needs. Without being able to limit workers to running a single BPMN at a time, each BPMN would need to have its own ASG if we wanted to use smaller machines, which would be very unwieldy.

The benefit that we would ultimately derive from having this capability to cap the number of concurrent processes on a worker would be that we could achieve a higher level of worker parallelism in the pipeline for the same (or similar) cost of EC2 resources.

@voxparcxls
Copy link
Collaborator

  • Should we make a new configuration variable for the process cap? like e.g.: max_processes_per_worker=4

  • If not, should this implementation have a UI component? If its in the UI components it can start with a default value and it can be changed

@eamonford
Copy link

From an M2020 Ops perspective, being able to configure this value at deploy time would be the most important (so having a configuration variable). Being able to change this configuration in the UI might be nice, but not critical.

@voxparcxls
Copy link
Collaborator

Okay thats do-able

Getting a UI input that is configurable on the fly, after deployment/installation, would take extra effort.

What do you think, @galenatjpl ?

@eamonford
Copy link

eamonford commented Oct 18, 2022

Okay! If it would be a lot of effort to make it UI configurable on the fly, it might not be worthwhile implementing that -- I don't think we would need to be changing it on the fly, because changing this value would probably go alongside changing the EC2 instance types in most cases (for M20 at least), which also wouldn't be done on the fly.

@voxparcxls
Copy link
Collaborator

Alright, thanks for the response. Will get start on this soon. If any requirements change, let me know

@galenatjpl
Copy link
Contributor Author

I agree that doing this at deploy time would be easiest, and quickest, and seems to satisfy M20 needs. However, that being said, the value would have to be stored somewhere (e.g. in the MariaDB cws_workers table in a column), so making it modifiable at runtime, by allowing a SQL update to be made is a good middle-ground here, without having to implement a UI.

@eamonford
Copy link

@galenatjpl That would definitely be a usable option for the operators!

@eamonford
Copy link

Here's a question: once this is implemented, how will priority be determined if multiple tasks are competing for a worker? For example, suppose BPMN A and BPMN B are both assigned to run on a given worker, and that worker has a configured cap of 1 concurrent process. If both BPMNs are waiting for that worker to free up, then once the worker finishes its current task, how will it decide whether to next take on the pending task from BPMN A or B first? Would it be determined by the timestamps of those scheduled tasks?

@galenatjpl
Copy link
Contributor Author

@eamonford and @voxparcxls : Per Eamon's question, timestamps (FIFO) and the priority field in the database should determine the ordering of what actually runs on a worker. This is done already in SQL queries on the backend, but the logic and code would have to be slightly changed for this feature to do this. We want to ensure fair scheduling, while avoiding Starvation.

@voxparcxls
Copy link
Collaborator

Alright, the coordination for avoiding is Starvation is what I'm working on per ProcessCounter . Thanks for the added detail @galenatjpl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants