feat(ai): add AI orchestrator metrics #3097

rickstaa · 2024-07-14T09:33:31Z

What does this pull request do? Explain your changes. (required)

This pull request introduces new Orchestrator AI metrics to the ai-video branch:

ai_models_requested: Tracks the number of AI job requests per capability and model.
ai_request_latency_score: Measures latency scores per model job request.
ai_request_price: Records the price paid per unit for each model request.
ai_request_errors: Logs AI requests encountered while the Orchestrator is processing the requested job.

To decrease code duplication this pull request uses the same metrics as the Gateway metrics pull request (see #3087).

Specific updates (required)

Updates census.go to include the new Orchestrator metrics.
Updates ai_http.go file to log these metrics.

How did you test each of these updates (required)

I set up both an on-chain and off-chain gateway to validate the metrics. I verified their visibility at http://localhost:7935/metrics and ensured they were correctly visualized in Grafana.

Does this pull request close any open issues?

This implements the functionality outlined in https://livepeer-ai.productlane.com/roadmap?id=d56cae33-2dbd-4187-8d3a-d1c5c35f890a

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
README and other documentation updated
Pending changelog updated

How to test

Check out this pull request.
Spin up an on-chain gateway with attached orchestrators.
Clone the repository https://github.com/rickstaa/livepeer-monitor-test.
Execute the Dockerfile in that repository to launch Prometheus and Grafana servers.
Navigate to http://localhost:7935/metrics to view the new AI orchestrator metrics.
Visit http://localhost:3000 to inspect these metrics in Grafana.

This commit adds the initial AI gateway metrics so that they can reviewed by others. The code still need to be cleaned up and the buckets adjusted.

This commit improves the AI metrics so that they are easier to work with.

This commit ensures that an error is logged when the Gateway could not find orchestrators for a given model and capability.

This commit ensure that the `ticket_value_sent` abd `tickets_sent` metrics are also created for a AI Gateway.

This commit ensures that the AI gateway metrics contain the orch address label.

This commit introduces a suite of AI orchestrator metrics to the census module, mirroring those received by the Gateway. The newly added metrics include `ai_models_requested`, `ai_request_latency_score`, `ai_request_price`, and `ai_request_errors`, facilitating comprehensive tracking and analysis of AI request handling performance on the orchestrator side.

rickstaa · 2024-07-17T00:57:23Z

monitor/census.go

+			Name:        "ai_request_latency_score",
+			Measure:     census.mAIRequestLatencyScore,
+			Description: "AI request latency score",
+			TagKeys:     append([]tag.Key{census.kPipeline, census.kModelName}, baseTagsWithNodeInfo...),


@eliteprox, @ad-astra-video do you think listing this per gateway label makes sense?

This commit ensures that the right tags are attached to the Orchestrator AI metrics.

This commit ensures that no devide by zero errors can occur in the latency score calculations.

* Add gateway metric for roundtrip ai times by model and pipeline * Rename metrics and add unique manifest * Fix name mismatch * modelsRequested not working correctly * feat: add initial POC AI gateway metrics This commit adds the initial AI gateway metrics so that they can reviewed by others. The code still need to be cleaned up and the buckets adjusted. * feat: improve AI metrics This commit improves the AI metrics so that they are easier to work with. * feat(ai): log no capacity error to metrics This commit ensures that an error is logged when the Gateway could not find orchestrators for a given model and capability. * feat(ai): add TicketValueSent and TicketsSent metrics This commit ensure that the `ticket_value_sent` abd `tickets_sent` metrics are also created for a AI Gateway. * fix(ai): ensure that AI metrics have orch address label This commit ensures that the AI gateway metrics contain the orch address label. * feat(ai): add orchestrator AI census metrics This commit introduces a suite of AI orchestrator metrics to the census module, mirroring those received by the Gateway. The newly added metrics include `ai_models_requested`, `ai_request_latency_score`, `ai_request_price`, and `ai_request_errors`, facilitating comprehensive tracking and analysis of AI request handling performance on the orchestrator side. * refactor: improve orchestrator metrics tags This commit ensures that the right tags are attached to the Orchestrator AI metrics. * refactor(ai): improve latency score calculations This commit ensures that no devide by zero errors can occur in the latency score calculations. --------- Co-authored-by: Elite Encoder <[email protected]>

eliteprox and others added 9 commits July 8, 2024 15:37

Add gateway metric for roundtrip ai times by model and pipeline

0767d66

Rename metrics and add unique manifest

e038791

Fix name mismatch

f959f07

modelsRequested not working correctly

9221e73

feat: add initial POC AI gateway metrics

a456c04

This commit adds the initial AI gateway metrics so that they can reviewed by others. The code still need to be cleaned up and the buckets adjusted.

feat: improve AI metrics

47c088f

This commit improves the AI metrics so that they are easier to work with.

feat(ai): log no capacity error to metrics

5982285

This commit ensures that an error is logged when the Gateway could not find orchestrators for a given model and capability.

feat(ai): add TicketValueSent and TicketsSent metrics

bd5b047

This commit ensure that the `ticket_value_sent` abd `tickets_sent` metrics are also created for a AI Gateway.

fix(ai): ensure that AI metrics have orch address label

b7181d4

This commit ensures that the AI gateway metrics contain the orch address label.

rickstaa changed the title ~~ai orchestrator metrics~~ feat(ai): add AI orchestrator metrics Jul 14, 2024

rickstaa changed the base branch from master to ai-video July 14, 2024 09:34

rickstaa force-pushed the ai-orchestrator-metrics branch from a1d53c6 to a3f7d53 Compare July 14, 2024 09:40

rickstaa commented Jul 17, 2024

View reviewed changes

rickstaa added 2 commits July 17, 2024 21:17

refactor: improve orchestrator metrics tags

7bfdd45

This commit ensures that the right tags are attached to the Orchestrator AI metrics.

Merge branch 'ai-video' into ai-orchestrator-metrics

f3bab00

rickstaa force-pushed the ai-orchestrator-metrics branch from 1513362 to f3bab00 Compare July 18, 2024 13:16

refactor(ai): improve latency score calculations

e57a7c0

This commit ensures that no devide by zero errors can occur in the latency score calculations.

rickstaa merged commit 5aadffb into ai-video Jul 18, 2024
6 of 8 checks passed

rickstaa deleted the ai-orchestrator-metrics branch July 18, 2024 13:40

This was referenced Aug 12, 2024

Ai 303/pricing multiple images t2i i2i #3126

Merged

Grafana Dashboard for Orchestrator Metrics Bounty JJassonn69/bounties#5

Closed

JJassonn69 mentioned this pull request Aug 17, 2024

Grafana Dashboard for Orchestrator Metrics Bounty [$500] livepeer/bounties#50

Open

rickstaa mentioned this pull request Sep 4, 2024

Documentation improvements [retroactive] JJassonn69/bounties#8

Open

5 tasks

rickstaa mentioned this pull request Sep 23, 2024

selection algorithm transcoding conflict patch #3181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): add AI orchestrator metrics #3097

feat(ai): add AI orchestrator metrics #3097

rickstaa commented Jul 14, 2024 •

edited

Loading

rickstaa Jul 17, 2024

feat(ai): add AI orchestrator metrics #3097

feat(ai): add AI orchestrator metrics #3097

Conversation

rickstaa commented Jul 14, 2024 • edited Loading

How to test

rickstaa Jul 17, 2024

Choose a reason for hiding this comment

rickstaa commented Jul 14, 2024 •

edited

Loading