Adds metrics on how different queues have different thread amounts allocated to them #37

hbontempo-br · 2024-09-02T02:49:08Z

Problem

Latency is great to measure workloads that are starving for resources, but some workloads are critical and we want to be able to see that a problem is forming before it starts.
How can we measure how can we measure if a given queue is about to "starve" for resources?

Idea

Measure how busy are the threads able to process a given queue.
If many threads are available then everything is fine, but once more and more threads become busy less available capacity we have for the queue until we reach a point where all threads are busy and we start to form a backlog.

Proposed Solution

Extract metrics from ::Sidekiq::ProcessSet.
::Sidekiq::ProcessSet gives a set of Process. Each process has the list of queues it's processing and how busy its threads are. From it we can just aggregate the information.

Envek

Thank you for your pull request!

It is a bit unexpected to see desire to monitor underutilization and try to predict full utilization as, I believe, in many apps variability in load may result in spikes of 100% to 0% load. Tell me a bit more about your load profile, how you plan to fight this noise?

But I suppose that if aggregate over some big enough time window, it can make sense and predict full utilization in advance.

Also, can you please measure how long this metric calculation takes on your production (with a lot of running worker processes). I suppose that it should be fast as it doesn't iterate over jobs in queues, so no need to disable it by default (it was my first thought when I saw .each inside .each), but want to be sure.

Envek · 2024-09-02T12:55:51Z

lib/yabeda/sidekiq.rb

+            queue_data[queue][:available] += process['concurrency'] - process['busy']
+          end
+        end
+        sidekiq_active_workers_count.set({}, process_data[:active])


Is process_data[:active] the same as stats.workers_size? (it is better to keep backward compatibility)

Envek · 2024-09-02T12:57:09Z

lib/yabeda/sidekiq.rb

+        ::Sidekiq::ProcessSet.new.each do |process|
+          process_data[:active] += process['concurrency']
+          process_data[:busy] += process['busy']
+          process_data[:available] += process['concurrency'] - process['busy']


What is the point of calculating available threads number here? It can be calculated in the monitoring system as well

hbontempo-br added 2 commits September 1, 2024 22:59

fix: job count spec that are not properly validating the expectations

aeac93b

feat: Adds metrics derived from ::Sidekiq::ProcessSet

c386a42

Envek reviewed Sep 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds metrics on how different queues have different thread amounts allocated to them #37

Adds metrics on how different queues have different thread amounts allocated to them #37

hbontempo-br commented Sep 2, 2024

Envek left a comment

Envek Sep 2, 2024

Envek Sep 2, 2024

Adds metrics on how different queues have different thread amounts allocated to them #37

Are you sure you want to change the base?

Adds metrics on how different queues have different thread amounts allocated to them #37

Conversation

hbontempo-br commented Sep 2, 2024

Problem

Idea

Proposed Solution

Envek left a comment

Choose a reason for hiding this comment

Envek Sep 2, 2024

Choose a reason for hiding this comment

Envek Sep 2, 2024

Choose a reason for hiding this comment