Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM pipeline with stream support #3114

Merged
merged 4 commits into from
Oct 1, 2024

Conversation

kyriediculous
Copy link
Contributor

@kyriediculous kyriediculous commented Jul 31, 2024

What does this pull request do? Explain your changes. (required)
Adds support for an LLM pipeline (see livepeer/ai-worker#137).

The LLM pipeline returns either a stream or a final response. Both are handled over HTTP on the go-livepeer side using SSE (server sent events).

TODO: revert ai-worker version after merge

How did you test each of these updates (required)
Ran manuel tests

Checklist:

@github-actions github-actions bot added the AI Issues and PR related to the AI-video branch. label Jul 31, 2024
@@ -130,6 +130,11 @@ func (orch *orchestrator) AudioToText(ctx context.Context, req worker.AudioToTex
return orch.node.AudioToText(ctx, req)
}

// Return type is LlmResponse, but a stream is available as well as chan(string)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment

@kyriediculous kyriediculous marked this pull request as ready for review August 1, 2024 02:18
@rickstaa rickstaa force-pushed the ai-video-rebase branch 2 times, most recently from 4d54872 to 8e654d7 Compare August 2, 2024 10:09
@kyriediculous kyriediculous changed the title wip: llm pipeline with stream support LLM pipeline with stream support Aug 5, 2024
@rickstaa rickstaa deleted the branch livepeer:ai-video August 7, 2024 20:53
@rickstaa rickstaa closed this Aug 7, 2024
@rickstaa rickstaa reopened this Aug 7, 2024
@rickstaa rickstaa deleted the branch livepeer:ai-video August 10, 2024 06:53
@rickstaa rickstaa closed this Aug 10, 2024
@rickstaa rickstaa reopened this Aug 10, 2024
@rickstaa rickstaa changed the base branch from ai-video-rebase to ai-video August 10, 2024 15:27
@ad-astra-video
Copy link
Contributor

@rickstaa I have completed my review and confirmed the pipeline works with local testing after rebase. Most of the rebase updates are from codegen changes with the recent release of SDKs. There was also a small update to check if req.Stream was specified so it would not cause a seg fault on the log line.

@kyriediculous do you have time to fix your branch for the changes in the draft PR? Also, do you have docs to add to livepeer/docs explaining how to use this pipeline?

Notes with review:

  • I tested with llama 8b and Phi-3.
  • Pricing is based on tokens requested. The responses can extend beyond the tokens requested when I tested it (e.g. 542 vs 500 requested max_tokens). I don't think this is major issue but noted to possibly improve in the future.

Orchestrator logs after fix for returning container discussed in ai-worker pr
Note the streamed response in the last request returns fast and returns the container about 22 seconds after the streamed response is complete. This is similar timing to the non-streamed response times in the first two requests.

I0921 04:20:39.317863       1 rpc.go:259] Received Ping request
I0921 04:21:58.934653       1 ai_http.go:373] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Received request id=7381e3f6 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
2024/09/21 04:22:21 INFO Returning container type=0 pipeline=llm-generate modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
I0921 04:22:21.444668       1 ai_http.go:414] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Processed request id=7381e3f6 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct took=22.509847764s
I0921 04:22:35.375109       1 ai_http.go:373] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Received request id=c31fe925 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
2024/09/21 04:22:57 INFO Returning container type=0 pipeline=llm-generate modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
I0921 04:22:57.349142       1 ai_http.go:414] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Processed request id=c31fe925 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct took=21.973917726s
I0921 04:23:15.569713       1 ai_http.go:373] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Received request id=9c554b71 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct
I0921 04:23:15.711151       1 ai_http.go:414] manifestID=33_meta-llama/Meta-Llama-3.1-8B-Instruct orchSessionID=ad5fc893 clientIP=127.0.0.1 Processed request id=9c554b71 cap=33 modelID=meta-llama/Meta-Llama-3.1-8B-Instruct took=141.364952ms
2024/09/21 04:23:37 INFO Returning container type=0 pipeline=llm-generate modelID=meta-llama/Meta-Llama-3.1-8B-Instruct

@ad-astra-video ad-astra-video mentioned this pull request Sep 23, 2024
5 tasks
core/ai.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@kyriediculous
Copy link
Contributor Author

  • Pricing is based on tokens requested. The responses can extend beyond the tokens requested when I tested it (e.g. 542 vs 500 requested max_tokens). I don't think this is major issue but noted to possibly improve in the future.

Yep with some models I noticed this too, some models don't treat it as a strict cut-off but a guideline and might need a few additional tokens to complete a sentence. We could enforce strict token counting on our end, the last thing I'll do tomorrow first thing is looking to improve pricing/token counting. Howerver as you say it's not a big issue as long as the amount of "overdraft" is always limited, the user will just end up with a negative credit balance and have to send more PM tickets with the next request.

Copy link
Contributor

@rickstaa rickstaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leszko, @thomshutt, @ad-astra-video I've briefly reviewed this pull request, and it seems ready to be merged. Since I'm currently out of the office and unable to perform a full E2E test, could one of you please confirm that everything is working as expected so we can proceed with the merge? Thanks! 🙏🏻

@@ -115,6 +116,7 @@ var CapabilityNameLookup = map[Capability]string{
Capability_ImageToVideo: "Image to video",
Capability_Upscale: "Upscale",
Capability_AudioToText: "Audio to text",
Capability_LLM: "Large Language Model",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to update to Large language model to work with the PipelineToCapability function. I know this is brittle, will look at making it better going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to have this mapping part of the core package.

It's only used for monitoring and should live in the monitoring package if we want it for readability of monitoring.

There's also inconsistent usage of a helper and direct mapping looking.

I really don't see much utility for this mapping other than as you say, a brittle part of the codebase.

Copy link
Contributor

@rickstaa rickstaa Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying the commit. We can remove it or improve it in the future 👍🏻. I made a backlog item for it.

core/capabilities.go Outdated Show resolved Hide resolved
Copy link
Contributor

@rickstaa rickstaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now thanks 🚀!

@rickstaa rickstaa merged commit 80c0ac9 into livepeer:ai-video Oct 1, 2024
8 checks passed
@kyriediculous kyriediculous deleted the nv/llm-pipeline branch October 3, 2024 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants