Evaluate Profile-Guided Optimization (PGO) and Post Link Optimization (PLO) #1433

zamazan4ik · 2023-11-21T19:00:16Z

zamazan4ik
Nov 21, 2023

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. According to the tests, PGO can help with achieving better performance. Also, I found interesting results about PGO effects on tsv-utils - project in the similar domain as qsv. Since all of these, I think trying to optimize qsv with PGO can be a good idea.

I already did some benchmarks and want to share my results.

Test environment

Macbook Pro 14 M1 (6 + 2 CPU, 16 Gib RAM)
Compiler - Rustc 1.74
qsv version: master branch on commit 531acbb072c48cbaca5d58b593243e0f5f0ec8d3

Right now I cannot perform the tests on my Linux machine (Fedora-based) due to some build errors: #1431 . But I think the results should be the same for the Linux platform as well.

Benchmark

For benchmark purposes, I use this QSV benchmark. For PGO optimization I use cargo-pgo tool. The same benchmark suite was used for the PGO training phase built with cargo pgo build -- --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv but with disabled LTO. The only change to the benchmark suite was done is benchmark run reduction since for the training phase is enough to run every test case only once.

PGO optimized results I got with QSV built with cargo pgo optimize build -- --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv but with disabled LTO. Release version is built with cargo build --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv.

Unfortunately, due to the bug in the Rustc compiler right now PGO cannot be enabled simultaneously with LTO for QSV. So I compare "QSV with LTO" vs "QSV with PGO". Later, when the bug will be fixed, we can apply LTO + PGO to QSV at the same time.

Results

I got the following results:

QSV Release + LTO results: https://gist.github.com/zamazan4ik/f166967f12ad27a2bf4253975c2e1907
QSV Release + PGO optimized results: https://gist.github.com/zamazan4ik/25611016ee295e13b29da4c07adb681b
(just for reference) QSV Relese + PGO instrumentation: https://gist.github.com/zamazan4ik/a0d217d3e25ff23670eae20003ddd40c

As I interpret the results, PGO measurably improves QSV performance in many cases.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks on QSV. If it shows improvements - add a note to the documentation about possible improvements in QSV performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize QSV according to their workloads.
Optimize pre-built QSV binaries

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

Rustc: a CI script for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script
Clang: Docs
Python:
- CPython: README
- Pyston: README
Go: Bash script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
Pydantic-core: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag

jqnatividad · 2023-11-22T11:39:00Z

jqnatividad
Nov 22, 2023
Maintainer

Amazing stuff @zamazan4ik !

I've been micro-optimizing the code and squeezing out more performance with every release but have been putting off PGO as I thought that the prep work would be prohibitive.

Using samply was my next major step in my micro-optimization efforts. That's why I created a samply release profile.

Thankfully, you made the connection that qsv's benchmark suite would be the perfect way to create a PGO pipeline - not only for the prebuilt binaries, but also for folks who want to compile and tune their own PGO custom builds based on their data, workload and target platforms!

And the payoff is eye-popping! Some commands are 20-30% faster!

I'll do as per your suggested action points. This will be a great way to put a cherry on top of qsv as I eye a 1.0 release in the next few months...

Thanks heaps!!!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Profile-Guided Optimization (PGO) and Post Link Optimization (PLO) #1433

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Evaluate Profile-Guided Optimization (PGO) and Post Link Optimization (PLO) #1433

zamazan4ik Nov 21, 2023

Test environment

Benchmark

Results

Further steps

Replies: 1 comment

jqnatividad Nov 22, 2023 Maintainer

zamazan4ik
Nov 21, 2023

jqnatividad
Nov 22, 2023
Maintainer