(exporterhelper): Add concurrency sender to send batches concurrently #10478

moh-osman3 · 2024-06-28T05:57:43Z

Description

This PR adds in a new component to the exporterhelper called the concurrency sender. This component is meant to limit the concurrency of RPC's independent from the queue sender's num_consumers. This functionality is desirable because the queue_sender does not apply backpressure, so the concurrency_sender can limit requests based on a configured concurrency limit. Additionally, the batch_sender can generate multiple batches from a request that are exported sequentially, so the concurrency_sender provides a way to export these batches in parallel.

This PR also changes the requestSender interface

type requestSender interface {
	component.Component
	send(context.Context, ...Request) error
	setNextSender(nextSender requestSender)
}

This allows us to optionally send multiple requests, which will allow batch producers to receive any errors associated with any of their exported batches.

Link to tracking issue

Related to #10368

Testing

Added unit tests and performed local testing to ensure concurrency works and errors are propagated back to batch producers

jmacd

This approach looks 💯 to me, thanks @moh-osman3.

exporter/exporterhelper/common.go

dmitryax · 2024-07-08T21:10:14Z

exporter/exporterhelper/common.go

@@ -21,7 +21,7 @@ import (
 // requestSender is an abstraction of a sender for a request independent of the type of the data (traces, metrics, logs).
 type requestSender interface {
 	component.Component
-	send(context.Context, Request) error
+	send(context.Context, ...Request) error


This allows us to optionally send multiple requests, which will allow batch producers to receive any errors associated with any of their exported batches.

Can you please elaborate on why this change is required with some examples? I don't understand why it's not possible without this change

Thanks for the review! The main issue comes from returning multiple errors back to a single batch producer. e.g. let's say one request enters the batch_sender and is split into 50 batches. We can't return from sendMergeSplitBatch until we collect a response from all 50 batch exports. In the current implementation we sequentially export these batches waiting for the previous batch to return -- ideally these exports should be concurrent subject to a concurrency limit. To make this concurrent we want to avoid creating too many goroutines that exceeds the concurrency limit. This is reason we need to pass the full slice of requests to the concurrency_sender so it can limit the number of goroutines used for export

dmitryax · 2024-07-08T21:13:01Z

exporter/exporterhelper/batch_sender.go

 	if bs.activeBatch.request != nil {
 		var err error
-		req, err = bs.mergeFunc(ctx, bs.activeBatch.request, req)
+		req, err = bs.mergeFunc(ctx, bs.activeBatch.request, reqs[0])


Why do we ignore other requests?

Ahh misunderstood this function and mistakenly thought it would only ever be called with len(reqs) == 1. I now realize this function is called when MaxBatchSize is not set so I am now merging all requests into a single request. Thanks for catching this

dmitryax · 2024-07-08T21:15:13Z

exporter/exporterhelper/common.go

+func WithConcurrency(config ConcurrencySettings) Option {
+	return func(o *baseExporter) error {
+		if !config.Enabled {
+			o.exportFailureMessage += " Try enabling the concurrency config option to send more requests in parallel."


This seems misleading. Not sure how enabling concurrency limit can help with export errors. There is no limit on concurrency by default

Simplified the config so this error is now removed

dmitryax · 2024-07-08T21:28:33Z

exporter/exporterhelper/concurrency_sender.go

+const defaultNumSenders = 10
+
+type ConcurrencySettings struct {
+	Enabled    bool `mapstructure:"enabled"`


This option seems excessive. It's the same as if concurrency is set to any number other than 0.

We probably don't even need a configuration grouping here. Just one field in the exporter config similar to timeout, e.g.:

otlp: timeout: 10s concurrency: 100 sending_queue: enabled: true retry_on_failure: enabled: true

Thanks, simplified my config as you suggested

dmitryax · 2024-07-10T13:31:26Z

exporter/exporterhelper/common.go

-	if bs, ok := be.batchSender.(*batchSender); ok {
-		// If queue sender is enabled assign to the batch sender the same number of workers.
-		if qs, ok := be.queueSender.(*queueSender); ok {
-			bs.concurrencyLimit = int64(qs.numConsumers)


The biggest problem with this approach is that we ignore the number of queue workers. It means that, typically, every batch would have to be stuck for the flush_timeout period before it can be sent further. That's why we have the concurrency limit tied to it. So, every time all the workers are stuck in one batch, we release them right away. And that behavior is gone with this PR.

So, it doesn't resolve the problem stated in #10368. Just adding an independent concurrency limit doesn't help with that

@dmitryax Thanks for the feedback I've spent a bit of time thinking about options here so that the queue_sender consumers don't block when there are more items we can batch. I’m wondering if we can swap the order of the batch sender and the queue sender so that we are actually queueing batches instead of queuing the initial request that comes in? Or otherwise it seems that the queue sender could do some prebatching in the queue_sender (i.e. keep popping items from the queue until we have roughly a batch size to send to the batch_sender).

Another possibility I explored was unblocking consumers by calling consumeFunc in a goroutine so that the batch_sender isn't blocked for the flush timeout and more requests can be consumed. However this might not work for the persistent queue which needs to know when consumeFunc returns so it can mark items for deletion from the persistent queue.

Any thoughts on these options?

A few problems with putting the queue before the batcher:

The queue is what adds asynchronous behavior to exporters. If it's not enabled, the batcher is expected to concurrently block the incoming requests.

Persistent queue provides durability. A crashed collector is expected to not lose any data. If we put another memory queue in front of batching, it breaks that promise.

Another possibility I explored was unblocking consumers by calling consumeFunc in a goroutine so that the batch_sender isn't blocked for the flush timeout and more requests can be consumed. However this might not work for the persistent queue which needs to know when consumeFunc returns so it can mark items for deletion from the persistent queue.

I've been coming to this conclusion. I'm imagining it would be something like the following:

batcher reads, but does not remove, records from the queue until it has a batch of min size

batcher sends the batch to the next consumer in the chain -- this could be the concurrency sender

on success, batcher removes the records from the queue

Perhaps there could be an option on the batcher to remove records from the queue optimistically, for users that prefer at most once vs. at least once delivery (default being at least once, for durability).

This is not trivial of course - it would mean a substantial change to the queue API.

github-actions · 2024-08-20T03:16:25Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-09-04T03:16:09Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

jmacd reviewed Jul 8, 2024

View reviewed changes

exporter/exporterhelper/common.go Outdated Show resolved Hide resolved

moh-osman3 mentioned this pull request Jul 8, 2024

[exporterhelper] Awkwardness due to API between queue sender and batch sender #10368

Open

dmitryax reviewed Jul 8, 2024

View reviewed changes

moh-osman3 force-pushed the mohosman/exporterhelper-errors branch 2 times, most recently from 42dfdfc to de3f418 Compare July 9, 2024 07:56

dmitryax reviewed Jul 10, 2024

View reviewed changes

moh-osman3 added 17 commits August 5, 2024 04:40

broken unit tests

9a2dd74

broken unit tests

5afc612

fixed tests

35d3718

unit tests

2da7cc1

gomod

fba2b12

fix

d60e233

revert batchprocessor

bb586d6

revert batchprocessor

804393e

rm concurrencyLimit from batchsender

008a892

simplify config + feedback

644c97f

fix incorrect condition

581ee33

ignore queue consumers, consume in batch sender

6f7bf8d

broken batcher tests but persistent queue deleting correctly

18253bb

batch directly when reading from the queue

15d7c67

multibatching rough draft

9897ad1

fix test

15a3c13

fmt

5432c27

moh-osman3 force-pushed the mohosman/exporterhelper-errors branch from ef787ea to 5432c27 Compare August 5, 2024 09:03

minor cleanup

669ed39

github-actions bot added the Stale label Aug 20, 2024

github-actions bot closed this Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(exporterhelper): Add concurrency sender to send batches concurrently #10478

(exporterhelper): Add concurrency sender to send batches concurrently #10478

moh-osman3 commented Jun 28, 2024 •

edited

Loading

jmacd left a comment

dmitryax Jul 8, 2024 •

edited

Loading

moh-osman3 Jul 9, 2024

dmitryax Jul 8, 2024

moh-osman3 Jul 9, 2024 •

edited

Loading

dmitryax Jul 8, 2024 •

edited

Loading

moh-osman3 Jul 9, 2024

dmitryax Jul 8, 2024

moh-osman3 Jul 9, 2024

dmitryax Jul 10, 2024 •

edited

Loading

dmitryax Jul 10, 2024 •

edited

Loading

moh-osman3 Jul 22, 2024

moh-osman3 Jul 22, 2024

moh-osman3 Jul 22, 2024

dmitryax Jul 22, 2024

axw Jul 23, 2024

github-actions bot commented Aug 20, 2024

github-actions bot commented Sep 4, 2024

(exporterhelper): Add concurrency sender to send batches concurrently #10478

(exporterhelper): Add concurrency sender to send batches concurrently #10478

Conversation

moh-osman3 commented Jun 28, 2024 • edited Loading

Description

Link to tracking issue

Testing

jmacd left a comment

Choose a reason for hiding this comment

dmitryax Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moh-osman3 Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

dmitryax Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmitryax Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

dmitryax Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 20, 2024

github-actions bot commented Sep 4, 2024

moh-osman3 commented Jun 28, 2024 •

edited

Loading

dmitryax Jul 8, 2024 •

edited

Loading

moh-osman3 Jul 9, 2024 •

edited

Loading

dmitryax Jul 8, 2024 •

edited

Loading

dmitryax Jul 10, 2024 •

edited

Loading

dmitryax Jul 10, 2024 •

edited

Loading