Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: http call waiting for idle connection monitor to finish #1178

Merged
merged 5 commits into from
Nov 14, 2024

Conversation

0marperez
Copy link
Contributor

@0marperez 0marperez commented Nov 11, 2024

Issue #

Original issue that sparked idle connection monitoring: awslabs/aws-sdk-kotlin#1214
OkHttp4 was broken by idle connection monitoring: #1176

Description of changes

  • Launching the connection monitors using the call context was causing the call to wait on the connection monitor coroutines. This fix creates a coroutine scope dedicated to connection monitors. This new scope will be cancelled/joined when the HTTP engine being used is closed.
  • Noticed shutdown was missing from OkHttp4Engine so added it.
  • The refactoring of OkHttpEngineConfig.buildClient fixes OkHttp4 Support Broken Due to Idle Connection PR #1176 by not including connectionListener as part of OkHttp4Engine

Calling S3 getObject twice went from taking ~6 sec to ~370ms

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@0marperez 0marperez added the no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly. label Nov 11, 2024

This comment has been minimized.

1 similar comment

This comment has been minimized.

@0marperez 0marperez marked this pull request as ready for review November 11, 2024 22:32
@0marperez 0marperez requested a review from a team as a code owner November 11, 2024 22:32
@@ -36,6 +36,7 @@ kotlin {

all {
languageSettings.optIn("aws.smithy.kotlin.runtime.InternalApi")
languageSettings.optIn("okhttp3.ExperimentalOkHttpApi")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: We should generally not opt into experimental/internal annotations we don't own across entire modules. Our opt-ins should be narrowly scoped and intentional. Please remove this and add @OptIn(ExperimentalOkHttpApi::class) on the specific methods or lines that require the opt-in.

internal class ConnectionIdleMonitor(val pollInterval: Duration) : ConnectionListener() {
private val monitorScope = CoroutineScope(Dispatchers.IO + SupervisorJob())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: HTTP engines have their own coroutine scope. Could we reuse that as a parent scope instead of creating a new one from scratch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, let me test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing some issues trying to use the HTTP engine scope. We try to wait for the HTTP engine job to complete before calling shutdown. And that make us wait on the connection idle monitor to just fail it seems like, since it's running in a while loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's unfortunate. We could probably structure engine closure a bit better to allow for child scopes/jobs but that's a bigger and riskier change. We can stick with this for now.

Comment on lines 28 to 32
fun close(): Unit = runBlocking {
monitors.values.forEach { it.cancelAndJoin() }
monitorScope.cancel()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Cancelling/joining child jobs sequentially in a loop is time-consuming. Moreover, the docs say that cancelling a parent job (even a SupervisorJob) should automatically cancel all child jobs. Does this code accomplish the same thing?

fun close(): Unit = runBlocking {
    monitorScope.cancel()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that we're not waiting for the child jobs to actually finish cancelling (join). So we could be leaving some resources behind.

I think it should improve performance if we just cancel and don't wait for a join. It shouldn't be an issue unless someone is using a lot of clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this and time to cancel the connection idle monitor(s) went from ~200ms to ~600us. Now that I think of it, if someone is opening and closing clients very often and with a lot of connections per client then not waiting for the connection monitors to join could be a bigger issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, you're right that could be an issue. So then how about:

fun close(): Unit = runBlocking {
    monitorScope.cancelAndJoin()
}

Copy link
Contributor Author

@0marperez 0marperez Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there's no cancelAndJoin for CoroutineScope, only Job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh jeez you're right. Well the CoroutineScope.cancel() method implementation pretty much just verifies the scope contains a job and then calls cancel on that job. We can do something similar here:

val monitorJob = requireNotNull(monitorScope.coroutineContext[Job]) { "<some error msg>" }
monitorJob.cancelAndJoin()

@@ -44,15 +44,15 @@ public class OkHttpEngine(
}

private val metrics = HttpClientMetrics(TELEMETRY_SCOPE, config.telemetryProvider)
private val client = config.buildClient(metrics)
private val connectionIdleMonitor = if (config.connectionIdlePollingInterval != null) ConnectionIdleMonitor(config.connectionIdlePollingInterval) else null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

private val connectionIdleMonitor = config.connectionIdlePollingInterval?.let { ConnectionIdleMonitor(it) }

Comment on lines 102 to 108
@OptIn(ExperimentalOkHttpApi::class)
val connectionListener = if (config.connectionIdlePollingInterval == null) {
ConnectionListener.NONE
} else {
ConnectionIdleMonitor(connectionIdlePollingInterval)
}

// use our own pool configured with the timeout settings taken from config
@OptIn(ExperimentalOkHttpApi::class)
val pool = ConnectionPool(
maxIdleConnections = 5, // The default from the no-arg ConnectionPool() constructor
keepAliveDuration = config.connectionIdleTimeout.inWholeMilliseconds,
TimeUnit.MILLISECONDS,
connectionListener = connectionListener,
)
connectionPool(pool)
connectionPool(poolOverride ?: pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: With this change we're now constructing a ConnectionPool we'll just throw away if poolOverride is set. We should only construct the OkHttp4-safe pool if poolOverride is null.

This comment has been minimized.

@@ -111,6 +109,7 @@ internal class ConnectionIdleMonitor(val pollInterval: Duration) : ConnectionLis
logger.trace { "Attempting to reset soTimeout..." }
try {
conn.socket().soTimeout = oldTimeout
logger.trace { "SoTimeout reset." }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: soTimeout

Comment on lines +74 to +78
override fun shutdown() {
client.connectionPool.evictAll()
client.dispatcher.executorService.shutdown()
metrics.close()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

This comment has been minimized.

Copy link
Contributor

@ianbotsf ianbotsf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending cancelAndJoin feedback.

Copy link

Affected Artifacts

Changed in size
Artifact Pull Request (bytes) Latest Release (bytes) Delta (bytes) Delta (percentage)
http-client-engine-okhttp-jvm.jar 115,427 112,330 3,097 2.76%
http-client-engine-okhttp4-jvm.jar 14,324 14,095 229 1.62%

@0marperez 0marperez merged commit 20d93fc into main Nov 14, 2024
16 checks passed
@0marperez 0marperez deleted the connection-idle-monitor-latency branch November 14, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OkHttp4 Support Broken Due to Idle Connection PR
3 participants