Benchmarking changes of the wire protocol #110

rockdaboot · 2024-08-12T10:47:23Z

When making changes of the wire protocol, we should take into account the effect on CPU usage, memory usage and network bandwidth. For this we need some tooling for doing (nearly) reproducible benchmarks.

Roughly, my thoughts are

record data passed to the Reporter
replay previously recorded data (with the same order and timing!)
record the uncompressed on-wire messages (protobuf blobs)
a benchmark Go tool that does compression and decompression of the protobuf messages (Go because we want to measure the Go implementations of the compressors)
a python tool to generate diagrams / tables from the results of the Go tool

The recorded data can be replayed multiple times, e.g. with and without a protocol implementation change, to allow comparisons of the change's effects.

florianl · 2024-08-12T11:20:44Z

When establishing and creating the OTel Profiling protocol, @petethepig invested noticeable time and effort in benchmarks - see petethepig/opentelemetry-collector#1. He also documented changes and potential options with https://docs.google.com/spreadsheets/d/1Q-6MlegV8xLYdz5WD5iPxQU2tsfodX1-CDV1WeGzyQ0/edit?gid=1732807979#gid=1732807979.

It might be worth considering building on this existing work.

athre0z · 2024-08-12T18:02:44Z

A simpler approach that @christos68k and I have been testing with previously is to build two profiling agents with two protocols that you want to compare, then running them at the same time on the same machine while applying some heavy workload and recording the sum of all message sizes. Sampling won't interrupt exactly the same traces in both agents, but if you run it for an hour or so it should statistically give you a pretty good estimate. From previous experience of looking at differential flamegraphs of two agents running on the same machine, I'd expect the error to be in the realm of 0.5 - 1% with that approach. It's arguably more difficult to reproduce for other reviewers than with @petethepig's approach or the one that you are describing in this issue here.

rockdaboot · 2024-08-15T16:21:13Z

#120 is a PoC for the ideas outlines in the issue description.

rockdaboot added the discussion label Aug 12, 2024

rockdaboot mentioned this issue Aug 12, 2024

profiles/follow up: location references in sample open-telemetry/opentelemetry-specification#4307

Open

rockdaboot mentioned this issue Aug 15, 2024

Tooling to benchmark wire messages #120

Open

rockdaboot changed the title ~~Benchmarking changes to the wire protocol~~ Benchmarking changes of the wire protocol Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking changes of the wire protocol #110

Benchmarking changes of the wire protocol #110

rockdaboot commented Aug 12, 2024 •

edited

Loading

florianl commented Aug 12, 2024

athre0z commented Aug 12, 2024

rockdaboot commented Aug 15, 2024

Benchmarking changes of the wire protocol #110

Benchmarking changes of the wire protocol #110

Comments

rockdaboot commented Aug 12, 2024 • edited Loading

florianl commented Aug 12, 2024

athre0z commented Aug 12, 2024

rockdaboot commented Aug 15, 2024

rockdaboot commented Aug 12, 2024 •

edited

Loading