LLM pipeline with stream support #3114

kyriediculous · 2024-07-31T20:25:05Z

What does this pull request do? Explain your changes. (required)
Adds support for an LLM pipeline (see livepeer/ai-worker#137).

The LLM pipeline returns either a stream or a final response. Both are handled over HTTP on the go-livepeer side using SSE (server sent events).

TODO: revert ai-worker version after merge

How did you test each of these updates (required)
Ran manuel tests

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
README and other documentation updated
Pending changelog updated

kyriediculous · 2024-07-31T20:25:27Z

core/orchestrator.go

@@ -130,6 +130,11 @@ func (orch *orchestrator) AudioToText(ctx context.Context, req worker.AudioToTex
 	return orch.node.AudioToText(ctx, req)
 }

+// Return type is LlmResponse, but a stream is available as well as chan(string)


remove this comment

ad-astra-video · 2024-09-21T05:55:57Z

@rickstaa I have completed my review and confirmed the pipeline works with local testing after rebase. Most of the rebase updates are from codegen changes with the recent release of SDKs. There was also a small update to check if req.Stream was specified so it would not cause a seg fault on the log line.

@kyriediculous do you have time to fix your branch for the changes in the draft PR? Also, do you have docs to add to livepeer/docs explaining how to use this pipeline?

Notes with review:

I tested with llama 8b and Phi-3.
Pricing is based on tokens requested. The responses can extend beyond the tokens requested when I tested it (e.g. 542 vs 500 requested max_tokens). I don't think this is major issue but noted to possibly improve in the future.

Orchestrator logs after fix for returning container discussed in ai-worker pr
Note the streamed response in the last request returns fast and returns the container about 22 seconds after the streamed response is complete. This is similar timing to the non-streamed response times in the first two requests.

I0921 04:20:39.317863       1 rpc.go:259] Received Ping request
I0921 04:21:58.934653       1 ai_http.go:373] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Received request id=7381e3f6 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
2024/09/21 04:22:21 INFO Returning container type=0 pipeline=llm-generate modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
I0921 04:22:21.444668       1 ai_http.go:414] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Processed request id=7381e3f6 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct took=22.509847764s
I0921 04:22:35.375109       1 ai_http.go:373] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Received request id=c31fe925 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
2024/09/21 04:22:57 INFO Returning container type=0 pipeline=llm-generate modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
I0921 04:22:57.349142       1 ai_http.go:414] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Processed request id=c31fe925 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct took=21.973917726s
I0921 04:23:15.569713       1 ai_http.go:373] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Received request id=9c554b71 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
I0921 04:23:15.711151       1 ai_http.go:414] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Processed request id=9c554b71 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct took=141.364952ms
2024/09/21 04:23:37 INFO Returning container type=0 pipeline=llm-generate modelID=meta-llama/Meta-Llama-3.1-8B-Instruct

cmd/livepeer/starter/starter.go

core/ai.go

go.mod

kyriediculous · 2024-09-30T22:38:45Z

Pricing is based on tokens requested. The responses can extend beyond the tokens requested when I tested it (e.g. 542 vs 500 requested max_tokens). I don't think this is major issue but noted to possibly improve in the future.

Yep with some models I noticed this too, some models don't treat it as a strict cut-off but a guideline and might need a few additional tokens to complete a sentence. We could enforce strict token counting on our end, the last thing I'll do tomorrow first thing is looking to improve pricing/token counting. Howerver as you say it's not a big issue as long as the amount of "overdraft" is always limited, the user will just end up with a negative credit balance and have to send more PM tickets with the next request.

rickstaa

@leszko, @thomshutt, @ad-astra-video I've briefly reviewed this pull request, and it seems ready to be merged. Since I'm currently out of the office and unable to perform a full E2E test, could one of you please confirm that everything is working as expected so we can proceed with the merge? Thanks! 🙏🏻

ad-astra-video · 2024-10-01T13:37:30Z

core/capabilities.go

@@ -115,6 +116,7 @@ var CapabilityNameLookup = map[Capability]string{
 	Capability_ImageToVideo:               "Image to video",
 	Capability_Upscale:                    "Upscale",
 	Capability_AudioToText:                "Audio to text",
+	Capability_LLM:                        "Large Language Model",


This needs to update to Large language model to work with the PipelineToCapability function. I know this is brittle, will look at making it better going forward.

There's no reason to have this mapping part of the core package.

It's only used for monitoring and should live in the monitoring package if we want it for readability of monitoring.

There's also inconsistent usage of a helper and direct mapping looking.

I really don't see much utility for this mapping other than as you say, a brittle part of the codebase.

Thanks for applying the commit. We can remove it or improve it in the future 👍🏻. I made a backlog item for it.

core/capabilities.go

Co-authored-by: Rick Staa <[email protected]>

rickstaa

Looks good now thanks 🚀!

github-actions bot added the AI Issues and PR related to the AI-video branch. label Jul 31, 2024

kyriediculous commented Jul 31, 2024

View reviewed changes

kyriediculous marked this pull request as ready for review August 1, 2024 02:18

kyriediculous requested a review from rickstaa as a code owner August 1, 2024 02:18

rickstaa force-pushed the ai-video-rebase branch 2 times, most recently from 4d54872 to 8e654d7 Compare August 2, 2024 10:09

kyriediculous force-pushed the nv/llm-pipeline branch from a39333c to 6d44a0d Compare August 5, 2024 18:28

kyriediculous changed the title ~~wip: llm pipeline with stream support~~ LLM pipeline with stream support Aug 5, 2024

kyriediculous force-pushed the nv/llm-pipeline branch from b053506 to d939f0a Compare August 6, 2024 02:28

rickstaa deleted the branch livepeer:ai-video August 7, 2024 20:53

rickstaa closed this Aug 7, 2024

rickstaa reopened this Aug 7, 2024

rickstaa deleted the branch livepeer:ai-video August 10, 2024 06:53

rickstaa closed this Aug 10, 2024

rickstaa reopened this Aug 10, 2024

rickstaa changed the base branch from ai-video-rebase to ai-video August 10, 2024 15:27

ad-astra-video mentioned this pull request Sep 23, 2024

Livepool llm rebase #3178

Closed

5 tasks

rickstaa reviewed Sep 30, 2024

View reviewed changes

cmd/livepeer/starter/starter.go Outdated Show resolved Hide resolved

rickstaa reviewed Sep 30, 2024

View reviewed changes

core/ai.go Outdated Show resolved Hide resolved

rickstaa reviewed Sep 30, 2024

View reviewed changes

go.mod Outdated Show resolved Hide resolved

kyriediculous added 2 commits September 30, 2024 23:19

cmd,core,server: llm pipeline with stream support

9082707

add basic pricing based on max out tokens

4a9a4bc

kyriediculous force-pushed the nv/llm-pipeline branch from f9549bc to a17f4ba Compare September 30, 2024 22:24

misc: update ai-worker dependency and its usage

eca254f

kyriediculous force-pushed the nv/llm-pipeline branch from a17f4ba to eca254f Compare September 30, 2024 22:28

rickstaa approved these changes Oct 1, 2024

View reviewed changes

rickstaa requested review from leszko and thomshutt October 1, 2024 11:37

ad-astra-video reviewed Oct 1, 2024

View reviewed changes

rickstaa reviewed Oct 1, 2024

View reviewed changes

core/capabilities.go Outdated Show resolved Hide resolved

change capability description

0be0966

Co-authored-by: Rick Staa <[email protected]>

rickstaa approved these changes Oct 1, 2024

View reviewed changes

rickstaa merged commit 80c0ac9 into livepeer:ai-video Oct 1, 2024
8 checks passed

kyriediculous deleted the nv/llm-pipeline branch October 3, 2024 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM pipeline with stream support #3114

LLM pipeline with stream support #3114

kyriediculous commented Jul 31, 2024 •

edited

Loading

kyriediculous Jul 31, 2024

ad-astra-video commented Sep 21, 2024

kyriediculous commented Sep 30, 2024

rickstaa left a comment

ad-astra-video Oct 1, 2024

kyriediculous Oct 1, 2024

rickstaa Oct 1, 2024 •

edited

Loading

rickstaa left a comment

LLM pipeline with stream support #3114

LLM pipeline with stream support #3114

Conversation

kyriediculous commented Jul 31, 2024 • edited Loading

kyriediculous Jul 31, 2024

Choose a reason for hiding this comment

ad-astra-video commented Sep 21, 2024

kyriediculous commented Sep 30, 2024

rickstaa left a comment

Choose a reason for hiding this comment

ad-astra-video Oct 1, 2024

Choose a reason for hiding this comment

kyriediculous Oct 1, 2024

Choose a reason for hiding this comment

rickstaa Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

rickstaa left a comment

Choose a reason for hiding this comment

kyriediculous commented Jul 31, 2024 •

edited

Loading

rickstaa Oct 1, 2024 •

edited

Loading