local code generate slowly #2034

yx1405585468 · 2024-09-20T10:00:53Z

Search before asking

I had searched in the issues and found no similar feature requirement.

Description

Hi, I deployed dbgpt on my own computer and loaded a local model Qwen2 0.5B, it reasoned very fast in dbgpt, I asked questions and it answered them very fast, however I'm in pycharm, and I'm getting it to work by writing code like.

from transformers import pipeline

messages = [
{“role”: “user”, “content”: “Who are you?”}, ]
]
pipe = pipeline(“text-generation”, model=“Qwen/Qwen2-0.5B”)
pipe(messages)

It's reasoning very slowly, although I'm sure cuda is being used to speed it up.
I don't understand why this is the case, and I'd like to achieve very fast reasoning locally as well

Use case

No response

Related issues

No response

Feature Priority

None

Are you willing to submit PR?

Yes I am willing to submit a PR!

yx1405585468 added enhancement New feature or request Waiting for reply labels Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local code generate slowly #2034

local code generate slowly #2034

yx1405585468 commented Sep 20, 2024

local code generate slowly #2034

local code generate slowly #2034

Comments

yx1405585468 commented Sep 20, 2024

Search before asking

Description

Use case

Related issues

Feature Priority

Are you willing to submit PR?