Evaluate Profile-Guided Optimization (PGO) and Post Link Optimization (PLO) #1433
Replies: 1 comment
-
Amazing stuff @zamazan4ik ! I've been micro-optimizing the code and squeezing out more performance with every release but have been putting off PGO as I thought that the prep work would be prohibitive. Using samply was my next major step in my micro-optimization efforts. That's why I created a samply release profile. Thankfully, you made the connection that qsv's benchmark suite would be the perfect way to create a PGO pipeline - not only for the prebuilt binaries, but also for folks who want to compile and tune their own PGO custom builds based on their data, workload and target platforms! And the payoff is eye-popping! Some commands are 20-30% faster! I'll do as per your suggested action points. This will be a great way to put a cherry on top of qsv as I eye a 1.0 release in the next few months... Thanks heaps!!! |
Beta Was this translation helpful? Give feedback.
-
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. According to the tests, PGO can help with achieving better performance. Also, I found interesting results about PGO effects on
tsv-utils
- project in the similar domain asqsv
. Since all of these, I think trying to optimize qsv with PGO can be a good idea.I already did some benchmarks and want to share my results.
Test environment
master
branch on commit531acbb072c48cbaca5d58b593243e0f5f0ec8d3
Right now I cannot perform the tests on my Linux machine (Fedora-based) due to some build errors: #1431 . But I think the results should be the same for the Linux platform as well.
Benchmark
For benchmark purposes, I use this QSV benchmark. For PGO optimization I use cargo-pgo tool. The same benchmark suite was used for the PGO training phase built with
cargo pgo build -- --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv
but with disabled LTO. The only change to the benchmark suite was done is benchmark run reduction since for the training phase is enough to run every test case only once.PGO optimized results I got with QSV built with
cargo pgo optimize build -- --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv
but with disabled LTO. Release version is built withcargo build --release --locked -F feature_capable,apply,geocode,luau,to,polars --bin qsv
.Unfortunately, due to the bug in the Rustc compiler right now PGO cannot be enabled simultaneously with LTO for QSV. So I compare "QSV with LTO" vs "QSV with PGO". Later, when the bug will be fixed, we can apply LTO + PGO to QSV at the same time.
Results
I got the following results:
As I interpret the results, PGO measurably improves QSV performance in many cases.
Further steps
I can suggest the following action points:
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
configure
scriptBeta Was this translation helpful? Give feedback.
All reactions