Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds metrics on how different queues have different thread amounts allocated to them #37

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hbontempo-br
Copy link

Problem

Latency is great to measure workloads that are starving for resources, but some workloads are critical and we want to be able to see that a problem is forming before it starts.
How can we measure how can we measure if a given queue is about to "starve" for resources?

Idea

Measure how busy are the threads able to process a given queue.
If many threads are available then everything is fine, but once more and more threads become busy less available capacity we have for the queue until we reach a point where all threads are busy and we start to form a backlog.

Proposed Solution

Extract metrics from ::Sidekiq::ProcessSet.
::Sidekiq::ProcessSet gives a set of Process. Each process has the list of queues it's processing and how busy its threads are. From it we can just aggregate the information.

Copy link
Member

@Envek Envek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your pull request!

It is a bit unexpected to see desire to monitor underutilization and try to predict full utilization as, I believe, in many apps variability in load may result in spikes of 100% to 0% load. Tell me a bit more about your load profile, how you plan to fight this noise?

But I suppose that if aggregate over some big enough time window, it can make sense and predict full utilization in advance.

Also, can you please measure how long this metric calculation takes on your production (with a lot of running worker processes). I suppose that it should be fast as it doesn't iterate over jobs in queues, so no need to disable it by default (it was my first thought when I saw .each inside .each), but want to be sure.

queue_data[queue][:available] += process['concurrency'] - process['busy']
end
end
sidekiq_active_workers_count.set({}, process_data[:active])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is process_data[:active] the same as stats.workers_size? (it is better to keep backward compatibility)

::Sidekiq::ProcessSet.new.each do |process|
process_data[:active] += process['concurrency']
process_data[:busy] += process['busy']
process_data[:available] += process['concurrency'] - process['busy']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of calculating available threads number here? It can be calculated in the monitoring system as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants