Releases · ROCm/vllm

01 Nov 23:03

github-actions

v0.6.3.post2+rocm

733f79a

v0.6.3.post2+rocm Latest

Latest

What's Changed

fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 by @divakar-amd in #250
Sccache removal from Dockerfile.rocm by @omirosh in #253
Update Dockerfile.rocm by @shajrawi in #254
Using the correct type hints by @gshtras in #256
Revert "Update Dockerfile.rocm" by @gshtras in #257
Creating ROCm whl upon release by @gshtras in #259

Full Changelog: v0.6.3.post1+rocm...v0.6.3.post2+rocm

What's Changed

Miscellaneous cosmetic changes by @mawong-amd in #166
V5.5 upstream merge rc by @gshtras in #167
fnuz support for fbgemm fp8 by @gshtras in #169
Fixing mypy after a rushed merge by @gshtras in #171
[fix] moe padding for reading correct tuned config by @divakar-amd in #172
Upstream merge 24/9/9 by @gshtras in #174
Restoring deleted .buildkite/test-template.j2 by @Alexei-V-Ivanov-AMD in #177
Support commandr on ROCm by @shajrawi in #180
Correct type hint by @gshtras in #173
update custom PA kernel with support for fp8 kv cache dtype by @sanyalington in #87
Support Grok-1 by @kkHuang-amd in #181
Adding MLPerf optimization to 0.6.0 by @charlifu in #182
6.2 dockerfile by @gshtras in #176
[Grok1] fix the name of input scale factor for autofp8 run by @kkHuang-amd in #183
[Grok-1] fix the run-time error "Can't pickle <class 'transformers_mo… by @kkHuang-amd in #184
Upstream merge 24/09/16 by @gshtras in #187
Perf improvement: remove redundant torch slice; Match decode PA partition size to csrc by @sanyalington in #188
refactor dbrx experts to use FusedMoe layer by @divakar-amd in #186
Disable moe padding by default and enable fp8 padding by default. by @charlifu in #190
Enabling Splitting HW by Buildkite Agents by @Alexei-V-Ivanov-AMD in #191
Revert "remove redundant slice; match decode PA partition size with csrc (#188)" by @gshtras in #194
[Grok-1] 1. upload moe configuration file for moe kernel optimization… by @kkHuang-amd in #193
Removing the original text in reminder_comment.yml by @Alexei-V-Ivanov-AMD in #195
Fix PA custom and PA v2 tests and partition sizes by @mawong-amd in #196
Adding P3L measurement to the benchmarks collection tools. by @Alexei-V-Ivanov-AMD in #197
Swapping the order of sampling operations in the conditional selector. by @Alexei-V-Ivanov-AMD in #199
remove redundant slice when chunked prefill feature is disabled by @sanyalington in #201
Fixing P3L incompatibility with cython. by @Alexei-V-Ivanov-AMD in #200
Bias and more metadata in gradlib and tuned gemm by @gshtras in #202
Upstream merge 24 9 23 by @gshtras in #203
Gating n=0 case from skinny gemm by @gshtras in #204
Revert "[Kernel] changing fused moe kernel chunk size default to 32k (vllm-project#7995)" by @gshtras in #207
re-enable avoid torch slice fix when chunked prefill is disabled by @sanyalington in #209
add block_manager_v2.py into setup_cython by @sanyalington in #210
extend moe padding to DUMMY weights by @divakar-amd in #211
[Int4-AWQ] Fix AWQ Marlin check for ROCm by @hegemanjw4amd in #206
RPD Profiling by @dllehr-amd in #208
Cythonize vllm build by @maleksan85 in #214
Fix Dockerfile.rocm by @gshtras in #215
fix dbrx weight loader by @divakar-amd in #212
Upstream merge 24 09 27 0.6.2 by @gshtras in #213
Make rpdtracer import only when required by @Rohan138 in #216
Improve profiling setup and documentation, sync benchmarks with main by @AdrianAbeyta in #218
Installing the requirements before invoking setup.py since it now imports setuptools_scm by @gshtras in #221
llama3.2 + cross attn test by @maleksan85 in #220
Optimize CAR for ROCm by @iotamudelta in #225
Custom PA perf improvements by @sanyalington in #222
Upstream merge 24 10 08 by @gshtras in #226
customPA write fp8 small ctx fix; enable customPA write fp8 by default by @sanyalington in #227
added timeout for vllm build in rocm by @maleksan85 in #230
Add fp8 for dbrx by @charlifu in #231
Update Buildkite env variable by @dhonnappa-amd in #232
cuda graph + num-scheduler-steps bug fix by @seungrokj in #236
[Model] [BUG] Fix code path logic to load mllama model by @tjtanaa in #234
prefix-enabled FA perf issue by @seungrokj in #239
Custom PA Partition size 256 to improve performance by @sanyalington in #238
[Build/CI] Minor changes to fix internal CI process. by @Alexei-V-Ivanov-AMD in #235
[BUGFIX] Restored handling of ROCM FA output as before adaptation of llama3.2 by @maleksan85 in #241
Upstream merge 24 10 21 by @gshtras in #240
Using the correct datatype on prefix prefill for fp8 kv cache by @gshtras in #242
Update CMakeLists.txt by @gshtras in #244
update block_manager usage in setup_cython by @saienduri in #243
[Bugfix][Kernel][Misc] Basic support for SmoothQuant, symmetric case by @rasmith in #237
Add fp8 support for llama model family on Navi4x by @qli88 in #245
Custom all reduce fix mi250 by @omirosh in #247
Upstream merge 24 10 28 by @gshtras in #248
fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 by @divakar-amd in #250
Sccache removal from Dockerfile.rocm by @omirosh in #253
Update Dockerfile.rocm by @shajrawi in #254
Using the correct type hints by @gshtras in #256
Revert "Update Dockerfile.rocm" by @gshtras in #257
Creating ROCm whl upon release by @gshtras in #259

New Contributors

@kkHuang-amd made their first contribution in #181
@Rohan138 made their first contribution in #216
@AdrianAbeyta made their first contribution in #218
@dhonnappa-amd made their first contribution in #232
@seungrokj made their first contribution in #236
@tjtanaa made their first contribution in #234
@saienduri made their first contribution in #243
@qli88 made their first contribution in #245
@omirosh made their first contribution in #247

Full Changelog: v0.4.3_rocm...v0.6.3.post2+rocm

Contributors

rasmith, charlifu, and 19 other contributors

Assets 6

29 Oct 21:12

github-actions

v0.6.3.post1+rocm

7aa6982

v0.6.3.post1+rocm Pre-release

Pre-release

What's Changed

Upstream merge 24 10 21 by @gshtras in #240
Using the correct datatype on prefix prefill for fp8 kv cache by @gshtras in #242
Update CMakeLists.txt by @gshtras in #244
update block_manager usage in setup_cython by @saienduri in #243
[Bugfix][Kernel][Misc] Basic support for SmoothQuant, symmetric case by @rasmith in #237
Add fp8 support for llama model family on Navi4x by @qli88 in #245
Custom all reduce fix mi250 by @omirosh in #247
Upstream merge 24 10 28 by @gshtras in #248

New Contributors

@saienduri made their first contribution in #243
@qli88 made their first contribution in #245
@omirosh made their first contribution in #247

Full Changelog: v0.6.2.post1+rocm...v0.6.3.post1+rocm

Contributors

rasmith, omirosh, and 3 other contributors

Assets 2

23 Oct 00:14

github-actions

v0.6.2.post1+rocm

69d5e1d

v0.6.2.post1+rocm Pre-release

Pre-release

What's Changed

Make rpdtracer import only when required by @Rohan138 in #216
Improve profiling setup and documentation, sync benchmarks with main by @AdrianAbeyta in #218
Installing the requirements before invoking setup.py since it now imports setuptools_scm by @gshtras in #221
llama3.2 + cross attn test by @maleksan85 in #220
Optimize CAR for ROCm by @iotamudelta in #225
Custom PA perf improvements by @sanyalington in #222
Upstream merge 24 10 08 by @gshtras in #226
customPA write fp8 small ctx fix; enable customPA write fp8 by default by @sanyalington in #227
added timeout for vllm build in rocm by @maleksan85 in #230
Add fp8 for dbrx by @charlifu in #231
Update Buildkite env variable by @dhonnappa-amd in #232
cuda graph + num-scheduler-steps bug fix by @seungrokj in #236
[Model] [BUG] Fix code path logic to load mllama model by @tjtanaa in #234
prefix-enabled FA perf issue by @seungrokj in #239
Custom PA Partition size 256 to improve performance by @sanyalington in #238
[Build/CI] Minor changes to fix internal CI process. by @Alexei-V-Ivanov-AMD in #235
[BUGFIX] Restored handling of ROCM FA output as before adaptation of llama3.2 by @maleksan85 in #241

New Contributors

@Rohan138 made their first contribution in #216
@AdrianAbeyta made their first contribution in #218
@dhonnappa-amd made their first contribution in #232
@seungrokj made their first contribution in #236
@tjtanaa made their first contribution in #234

Full Changelog: v0.6.2+rocm...v0.6.2.post1+rocm

Contributors

charlifu, iotamudelta, and 9 other contributors

Assets 2

02 Oct 17:29

github-actions

v0.6.2+rocm

030374b

v0.6.2+rocm Pre-release

Pre-release

What's Changed

fix dbrx weight loader by @divakar-amd in #212
Upstream merge 24 09 27 0.6.2 by @gshtras in #213

Full Changelog: v0.6.1.post1+rocm...v0.6.2+rocm

Contributors

divakar-amd and gshtras

Assets 2

27 Sep 21:48

github-actions

v0.6.1.post1+rocm

956b831

v0.6.1.post1+rocm Pre-release

Pre-release

What's Changed

Adding P3L measurement to the benchmarks collection tools. by @Alexei-V-Ivanov-AMD in #197
Swapping the order of sampling operations in the conditional selector. by @Alexei-V-Ivanov-AMD in #199
remove redundant slice when chunked prefill feature is disabled by @sanyalington in #201
Fixing P3L incompatibility with cython. by @Alexei-V-Ivanov-AMD in #200
Bias and more metadata in gradlib and tuned gemm by @gshtras in #202
Upstream merge 24 9 23 by @gshtras in #203
Gating n=0 case from skinny gemm by @gshtras in #204
Revert "[Kernel] changing fused moe kernel chunk size default to 32k (vllm-project#7995)" by @gshtras in #207
re-enable avoid torch slice fix when chunked prefill is disabled by @sanyalington in #209
add block_manager_v2.py into setup_cython by @sanyalington in #210
extend moe padding to DUMMY weights by @divakar-amd in #211
[Int4-AWQ] Fix AWQ Marlin check for ROCm by @hegemanjw4amd in #206
RPD Profiling by @dllehr-amd in #208
Cythonize vllm build by @maleksan85 in #214
Fix Dockerfile.rocm by @gshtras in #215

Full Changelog: v0.6.1_rocm...v0.6.1.post1+rocm

Contributors

sanyalington, dllehr-amd, and 5 other contributors

Assets 2

19 Sep 15:16

github-actions

v0.6.1_rocm

a67b65b

v0.6.1_rocm Pre-release

Pre-release

What's Changed

[fix] moe padding for reading correct tuned config by @divakar-amd in #172
Upstream merge 24/9/9 by @gshtras in #174
Restoring deleted .buildkite/test-template.j2 by @Alexei-V-Ivanov-AMD in #177
Support commandr on ROCm by @shajrawi in #180
Correct type hint by @gshtras in #173
update custom PA kernel with support for fp8 kv cache dtype by @sanyalington in #87
Support Grok-1 by @kkHuang-amd in #181
Adding MLPerf optimization to 0.6.0 by @charlifu in #182
6.2 dockerfile by @gshtras in #176
[Grok1] fix the name of input scale factor for autofp8 run by @kkHuang-amd in #183
[Grok-1] fix the run-time error "Can't pickle <class 'transformers_mo… by @kkHuang-amd in #184
Upstream merge 24/09/16 by @gshtras in #187
Perf improvement: remove redundant torch slice; Match decode PA partition size to csrc by @sanyalington in #188
refactor dbrx experts to use FusedMoe layer by @divakar-amd in #186
Disable moe padding by default and enable fp8 padding by default. by @charlifu in #190
Enabling Splitting HW by Buildkite Agents by @Alexei-V-Ivanov-AMD in #191
Revert "remove redundant slice; match decode PA partition size with csrc (#188)" by @gshtras in #194
[Grok-1] 1. upload moe configuration file for moe kernel optimization… by @kkHuang-amd in #193
Removing the original text in reminder_comment.yml by @Alexei-V-Ivanov-AMD in #195
Fix PA custom and PA v2 tests and partition sizes by @mawong-amd in #196

New Contributors

@kkHuang-amd made their first contribution in #181

Full Changelog: v0.6.0_rocm...v0.6.1_rocm

Contributors

charlifu, sanyalington, and 6 other contributors

Assets 2

05 Sep 17:10

github-actions

v0.6.0_rocm

8032519

v0.6.0_rocm Pre-release

Pre-release

What's Changed

Features integration without fp8 by @gshtras in #7
Layernorm optimizations by @mawong-amd in #8
Bringing in the latest commits from upstream by @mawong-amd in #9
Bump Docker to ROCm 6.1, add gradlib for tuned gemm, include RCCL fixes by @mawong-amd in #12
add mi300 fused_moe tuned configs by @divakar-amd in #13
Correctly calculating the same value for the required cache blocks num for all torchrun processes by @gshtras in #15
[ROCm] adding a missing triton autotune config by @hongxiayang in #17
make the vllm setup mode configurable and make install mode as defaul… by @hongxiayang in #18
enable fused topK_softmax kernel for hip by @divakar-amd in #14
Fix ambiguous fma call by @cjatin in #16
Rccl dockerfile updates by @mawong-amd in #19
Dockerfile improvements: multistage by @mawong-amd in #20
Integrate PagedAttention Optimization custom kernel into vLLM by @lcskrishna in #22
Updates to custom PagedAttention for supporting context len upto 32k. by @lcskrishna in #25
Update max_context_len for custom paged attention. by @lcskrishna in #26
Update RCCL, hipBLASLt, base image in Dockerfile.rocm by @shajrawi in #24
Adding fp8 gemm computation by @charlifu in #29
fix the model loading fp8 by @charlifu in #30
Update linear.py by @gshtras in #32
Update base docker image with Pytorch 2.3 by @charlifu in #35
Removed HIP specific matvec logic that is duplicated from tuned_gemm.py and doesn't support bf16 by @gshtras in #23
Use inp_view for out = F.linear() in TunedGemm by @charlifu in #36
Fix the symbol not found issue of the new base image by @charlifu in #37
G42 bias triton fix rocm main by @gshtras in #38
Update ROCm vLLM to 0.4.3 by @mawong-amd in #40
Re-applying G42 bias triton fix on 0.4.3 by @gshtras in #41
Fix RCCL install, linear.py logic, CMake custom extension, update requirement for FP8 compute by @mawong-amd in #42
Linting main in line with upstream requirements by @mawong-amd in #43
Include benchmark scripts in container by @mawong-amd in #45
Adding fp8 to gradlib by @charlifu in #44
Update fp8_gemm_tuner.py exchange import torch and hipbsolidxgemm by @liligwu in #46
Supporting quantized weights from Quark by default. by @charlifu in #47
Update quark quantizer command in fp8 instruction by @charlifu in #49
Fix LLMM1 kernel by @fxmarty in #28
Use scaled mm for untuned fp8 gemm by @charlifu in #50
tuned moe configs v2 by @divakar-amd in #33
Revert "Tune fused_moe_kernel for TP 1,2,4,8 and bf16 and fp16, updated moe kern…" by @hthangirala in #51
Revert "Revert "Tune fused_moe_kernel for TP 1,2,4,8 and bf16 and fp16, updated moe kern…"" by @divakar-amd in #53
fix init files by @divakar-amd in #52
adds wvSpltK optimization for skinny gemm. by @amd-hhashemi in #54
Fix 8K decode latency jump issue. by @lcskrishna in #55
Adding quantization_weights_path for fp8 weights by @charlifu in #57
Refactor custom gemm heuristics by @gshtras in #56
wvSpltK fix for 10GB+ output tensors by @amd-hhashemi in #61
uint64_t instead of unsigned long for clarity by @mawong-amd in #62
fix for oob LDS fill in wvSpltK slm version by @amd-hhashemi in #63
[Kernel] Enable custom AR on ROCm by @wenkaidu in #27
Fix the Runtime Error When Loading kv cache scales by @charlifu in #65
Fix numpy and XGMI 1-hop detection by @mawong-amd in #67
Fix XGMI linting by @mawong-amd in #68
Merging fp8_gemm_tuner.py to gemm_tuner.py by @charlifu in #66
Wokaround for SWDEV-470361 by @gshtras in #69
[1/2] Fix up ROCm 6.2 tests correctly in main by @mawong-amd in #72
[2/2] Using xfail instead of skip for ROCm 6.2 tests by @mawong-amd in #70
Dockerfile updates: base image, preemptive uninstalls; restore ROCm 6.2 metrics test by @mawong-amd in #73
Return int64 dtype for solidx in tuning results by @charlifu in #74
[Build/CI] tests for rocm/vllm:main as of 2024-06-28 by @Alexei-V-Ivanov-AMD in #77
Fix gradlib fp8 output by @charlifu in #76
Allocate workspace for hipblaslt fp8 gemm. by @charlifu in #78
Mixtral moe tuning for mi308 by @divakar-amd in #80
Remove elementwise kernel before each fp8 gemm by @charlifu in #81
Charlifu/avoid tensor creation before each gemm by @HaiShaw in #82
TP=1 moe tuning for mixtral-8x7B by @divakar-amd in #84
Mixtral-8x22B tuning mi308x by @divakar-amd in #85
moe tuning for larger input lens by @divakar-amd in #86
Reduce csv writes by @charlifu in #92
fix the type error due to the miss-use of the logging module by @liligwu in #105
Update Dockerfile.rocm by @shajrawi in #107
Greg/fast server by @gshtras in #106
converts wvSpltK reduce to pure dpp for further perf uplift. by @amd-hhashemi in #64
Revert "Fix 8K decode latency jump issue." by @mawong-amd in #108
adding a simple model invocation involving fp8 calculation/storage by @Alexei-V-Ivanov-AMD in #109
Adding bf16 output dtype for fp8 gemm by @charlifu in #111
Running server and LLM in different processes by @gshtras in #110
Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters by @gshtras in #114
Add distributed executor backend to benchmark scripts by @mawong-amd in #118
Add weight padding for moe by @charlifu in #119
[BugFix] Fix navi build after many custom for MI kernels added by @maleksan85 in #116
add emtpy_cache() after each padding by @charlifu in #120
[FIX] Gradlib OOM on Navi and sometimes on MI by @maleksan85 in #124
Save shape when fp8 solution not found by @charlifu in #123
Fix unit test for moe by adding padding by @charlifu in #128
Llama3.1 by @gshtras in #129
chat/completions endpoint by @gshtras in #121
Optimize custom all reduce by @iotamudelta in #130
Add BF16 support to custom PA by @sanyalington in #133
Making check for output match in original types. It saves some memory. by @maleksan85 in #135
Make CAR ROCm 6.1 compatible. by @iotamudelta in #137
Car revert by @gshtras in #140
Using the correct datatypes for streaming non-chat completions by @gshtras in #134
Adding UNREACHABLE_CODE macro for non MI300 and MI250 cards by @maleksan85 in #138
[FIX] gfx90a typo fix by @maleksan85 in #142
wvsplitk templatized and better tuned for MI300 by @amd-hhashemi in #132
[Bugfix] Dockerfile.rocm by @zstreet87 in #141
Update test-template.j2 by @okakarpa in #145
Adding Triton implementations awq_dequantize and awq_gemm to ROCm by @rasmith in #136
Adding fp8 padding by @charlifu in #144
[Int4-AWQ] Torch Int-4 AWQ Dequantization and Configuration Options by @hegemanjw4amd in #146
buildkit requirement for building docker images by @hongxiayang in #149
cupy build fix for SWDEV-475036 by @hongxiayang in https...

Contributors

rasmith, HaiShaw, and 20 other contributors

Assets 7

05 Sep 17:10

github-actions

v0.6.0

32e7db2

v0.6.0 Pre-release

Pre-release

Full Changelog: v0.5.5...v0.6.0

Assets 7

06 Jun 22:04

github-actions

v0.4.0

68cdb95

v0.4.0

What's Changed

Features integration without fp8 by @gshtras in #7
Layernorm optimizations by @mawong-amd in #8
Bringing in the latest commits from upstream by @mawong-amd in #9
Bump Docker to ROCm 6.1, add gradlib for tuned gemm, include RCCL fixes by @mawong-amd in #12
add mi300 fused_moe tuned configs by @divakar-amd in #13
Correctly calculating the same value for the required cache blocks num for all torchrun processes by @gshtras in #15
[ROCm] adding a missing triton autotune config by @hongxiayang in #17
make the vllm setup mode configurable and make install mode as defaul… by @hongxiayang in #18
enable fused topK_softmax kernel for hip by @divakar-amd in #14
Fix ambiguous fma call by @cjatin in #16
Rccl dockerfile updates by @mawong-amd in #19
Dockerfile improvements: multistage by @mawong-amd in #20
Integrate PagedAttention Optimization custom kernel into vLLM by @lcskrishna in #22
Updates to custom PagedAttention for supporting context len upto 32k. by @lcskrishna in #25
Update max_context_len for custom paged attention. by @lcskrishna in #26
Update RCCL, hipBLASLt, base image in Dockerfile.rocm by @shajrawi in #24
Adding fp8 gemm computation by @charlifu in #29
fix the model loading fp8 by @charlifu in #30
Update linear.py by @gshtras in #32
Update base docker image with Pytorch 2.3 by @charlifu in #35

New Contributors

@divakar-amd made their first contribution in #13
@hongxiayang made their first contribution in #17
@cjatin made their first contribution in #16
@lcskrishna made their first contribution in #22
@shajrawi made their first contribution in #24

Full Changelog: v0.3.3...v0.4.0

Contributors

charlifu, lcskrishna, and 6 other contributors

Assets 2

07 Feb 20:10

github-actions

v0.3.0

1af090b

v0.3.0

Full Changelog: https://github.com/ROCm/vllm/commits/v0.3.0

Assets 10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: ROCm/vllm

v0.6.3.post2+rocm

What's Changed

What's Changed

New Contributors

Contributors

v0.6.3.post1+rocm

What's Changed

New Contributors

Contributors

v0.6.2.post1+rocm

What's Changed

New Contributors

Contributors

v0.6.2+rocm

What's Changed

Contributors

v0.6.1.post1+rocm

What's Changed

Contributors

v0.6.1_rocm

What's Changed

New Contributors

Contributors

v0.6.0_rocm

What's Changed

Contributors

v0.6.0

v0.4.0

What's Changed

New Contributors

Contributors

v0.3.0