Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeepAlive Ping Error Handling #738

Open
lyokato opened this issue Sep 26, 2024 · 3 comments
Open

KeepAlive Ping Error Handling #738

lyokato opened this issue Sep 26, 2024 · 3 comments
Labels

Comments

@lyokato
Copy link

lyokato commented Sep 26, 2024

The following is an excerpt of the code that sets up KeepAlive written in lib/src/client/http2_connection.dart

      if (options.keepAlive.shouldSendPings) {
        keepAliveManager = ClientKeepAlive(
          options: options.keepAlive,
          ping: () {
            if (transport.isOpen) {
              transport.ping();
            }
          },
          onPingTimeout: () => transport.finish(),
        );
        transport.onFrameReceived
            .listen((_) => keepAliveManager?.onFrameReceived());
      }

We found that the above code occasionally threw an exception when calling the transport.ping().
Sometimes an HTTP/2 Connection error occurred, starting from the http2 library side.

Therefore, we have applied the following patch to prevent the exception from flying.

              transport.ping().catchError(() {
                shutdown();
              });

I am not sure if this is the appropriate way to fix the problem, but I will report it to you.

@mosuem
Copy link
Contributor

mosuem commented Sep 26, 2024

Thanks for reporting! I will take a look. Do you have any more information on the environment where this occurs? I haven't seen this happening on my machine.

@lyokato
Copy link
Author

lyokato commented Sep 27, 2024

I was using the following

  • Use with request/response style APIs.
  • However, to avoid the cost of establishing a connection each time, once a channel is created, it is used repeatedly. Send multiple requests through a single channel.
  • For this reason, KeepAlive is set for the channel, and the pingInterval is set to 30 seconds.
  • The part where a request is sent and the response is received is enclosed in try/catch, so that GrpcError can be caught.

When we had users using the app in this situation, errors were reported to Firebase Crashlytics and Bugsnag.

HTTP/2 error: Connection error: Connection is being forcefully terminated. (errorCode: 10)

This is what was flowing to PlatformDispatcher.onError in the Flutter application. This means that an uncaught error is occurring somewhere.

The place where the request is sent is enclosed in try/catch as described above. This is why I thought the KeepAlive ping was suspicious.


Then we looked at where this error originated in the first place.

https://github.com/dart-lang/http2/blob/master/lib/src/connection.dart

There are several places in this code where the following calls are made

 _terminate(ErrorCode.CONNECT_ERROR, causedByTransportError: true);

The following is implemented in the _terminate method called here.

      var exception = TransportConnectionException(
          errorCode, 'Connection is being forcefully terminated.');

      // Close all streams & stream queues
      _streams.terminate(exception);

      // Close the connection queues
      _incomingQueue.terminate(exception);
      _outgoingQueue.terminate(exception);

      _pingHandler.terminate(exception);
      _settingsHandler.terminate(exception);

From the error message, we could see that it was this very exception that occurred this time.

We found that this exception was being passed to streams, incomingQueue, outgoingQueue, pingHandler, and settingsHandler.

Since we thought ping was suspicious, we checked the code of this _pingHandler and tracked how it was being used from the grpc-dart side.

As a result, we found that no catchError was made at the point of the first patch.

After applying the patch, I have had users use the app, and the error is no longer being sent to PlatformDispatcher.onError.


We do not know how to reproduce this reliably.

We are using grpc via an AWS load balancer.
It is possible that the problem occurs when we hang a connection for a long time via the load balancer, but we have not yet investigated that in detail.
We plan to investigate further.

@mosuem
Copy link
Contributor

mosuem commented Sep 27, 2024

Thanks for the write-up, this is helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants