Add non-exiting methods and implement SIGHUP reload #405

samurai00 · 2024-09-30T10:29:41Z

Hey team! I've tried to make a few small tweaks to our server to hopefully improve its flexibility and add some reload functionality. Here's what's new:

Added pub fn try_bootstrap(&mut self) -> Result<bool> that doesn't call exit
Refactored pub fn bootstrap(&mut self) to use the new try_bootstrap method
Added pub fn run_server(mut self, enable_daemon: bool) -> Result<bool> that doesn't call exit
Kept pub fn run_forever(mut self) -> ! the same, but now it calls run_server under the hood
Implemented SIGHUP catching for reloading
Added an example server_reload to show how to reload within a tokio runtime

Related issues:

eaufavor · 2024-09-30T19:03:16Z

I think it is nice to break down the functions so that it is more flexible.

On the other hand, the reload in the example has a downtime (the old service shuts down before the new service starts). Do you have a plan to address that?

To me, the way of graceful reload is the following
when the reload signal arrives:

start the new service and wait for the listeners to be passed to it
signal the old service to gracefully exit
wait for grace_period_seconds before fully stopping/killing the old service

The current run_server() does 2) + 3) before returning for the example to do 1). So there is a window that the service appear to be offline.

samurai00 · 2024-10-01T05:33:58Z

I agree that the new service should be started before shutting down the old one, and I thought I had attempted to do this in the example. After conducting some simple tests, it seemed to work as expected.

However, more thorough testing might be necessary to ensure everything is functioning correctly. I'll conduct some additional tests to verify this.

samurai00 · 2024-10-08T03:30:27Z

I think it is nice to break down the functions so that it is more flexible.

On the other hand, the reload in the example has a downtime (the old service shuts down before the new service starts). Do you have a plan to address that?

To me, the way of graceful reload is the following when the reload signal arrives:

start the new service and wait for the listeners to be passed to it

signal the old service to gracefully exit

wait for grace_period_seconds before fully stopping/killing the old service

The current run_server() does 2) + 3) before returning for the example to do 1). So there is a window that the service appear to be offline.

After trying, I found that in the example, the actual behavior is:

signal the old service to gracefully exit
start the new service and wait for the listeners to be passed to it

These two steps can be considered to occur simultaneously.

If the old service doesn't start to gracefully exit, the call to bootstrap() during the upgrade will fail, preventing the new service from actually starting.
It seems there wouldn't be much difference if we notify the old service to start gracefully exiting before calling bootstrap().

From analysis and logs, it appears that in non-high concurrency situations, from the time the old service begins to gracefully exit until the new service successfully starts, requests can continue to work without downtime.

Logs:

[2024-10-08T03:17:27Z INFO  pingora_core::server] SIGHUP received, sending socks and gracefully reloading
[2024-10-08T03:17:27Z INFO  pingora_core::server] Trying to send socks
[2024-10-08T03:17:27Z WARN  pingora_core::server::transfer_fd] server not ready, will try again in 1s
[2024-10-08T03:17:27Z INFO  pingora_core::server] Bootstrap starting
[2024-10-08T03:17:27Z ERROR pingora_core::server::transfer_fd] No incoming socket transfer, sleep 1s and try again
[2024-10-08T03:17:28Z DEBUG server_reload::app::echo] request count: 109
[2024-10-08T03:17:28Z DEBUG server_reload::app::echo] request count: 110
[2024-10-08T03:17:28Z DEBUG server_reload::app::echo] request count: 111
[2024-10-08T03:17:28Z DEBUG server_reload::app::echo] request count: 112
[2024-10-08T03:17:28Z INFO  pingora_core::server] listener sockets sent
[2024-10-08T03:17:28Z INFO  pingora_core::server] Bootstrap done
[2024-10-08T03:17:28Z INFO  pingora_core::server] Server starting
[2024-10-08T03:17:28Z DEBUG server_reload::app::echo] request count: 113
... similar debug logs omitted ...
[2024-10-08T03:17:33Z DEBUG server_reload::app::echo] request count: 136
[2024-10-08T03:17:33Z INFO  pingora_core::server] Broadcasting graceful shutdown
[2024-10-08T03:17:33Z INFO  pingora_core::server] Graceful shutdown started!
[2024-10-08T03:17:33Z INFO  pingora_core::server] Broadcast graceful shutdown complete
[2024-10-08T03:17:33Z INFO  pingora_core::server] Graceful shutdown: grace period 5s starts
[2024-10-08T03:17:33Z INFO  pingora_core::services::listening] Shutting down 0.0.0.0:6145
[2024-10-08T03:17:33Z INFO  pingora_core::server] service exited.
[2024-10-08T03:17:33Z DEBUG server_reload::app::echo] request count: 137
... similar debug logs omitted ...
[2024-10-08T03:17:38Z DEBUG server_reload::app::echo] request count: 160
[2024-10-08T03:17:38Z INFO  pingora_core::server] Graceful shutdown: grace period ends
[2024-10-08T03:17:38Z INFO  pingora_core::server] Waiting for runtimes to exit!
[2024-10-08T03:17:38Z DEBUG server_reload::app::echo] request count: 161
... similar debug logs omitted ...
[2024-10-08T03:17:43Z DEBUG server_reload::app::echo] request count: 184
[2024-10-08T03:17:43Z INFO  pingora_core::server] All runtimes exited, exiting now
[2024-10-08T03:17:43Z DEBUG server_reload::app::echo] request count: 185
[2024-10-08T03:17:43Z INFO  server_reload] Reload: true
...

johnhurt self-assigned this Oct 4, 2024

samurai00 force-pushed the reload-signal branch from 6650a32 to 54cb6e0 Compare October 12, 2024 06:18

samurai00 force-pushed the reload-signal branch 2 times, most recently from 951fbfc to b9a6827 Compare November 4, 2024 03:46

samurai00 added 4 commits November 12, 2024 17:26

handle HUP signal; try_bootstrap() run_server() added

775bee4

server_reload example added

168ad1f

fix: correct usage of upgrade variable

b14f9b4

Ensure sending socks success before broadcasting shutdown signal

ee463c3

samurai00 force-pushed the reload-signal branch from b9a6827 to ee463c3 Compare November 12, 2024 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add non-exiting methods and implement SIGHUP reload #405

Add non-exiting methods and implement SIGHUP reload #405

samurai00 commented Sep 30, 2024

eaufavor commented Sep 30, 2024

samurai00 commented Oct 1, 2024

samurai00 commented Oct 8, 2024

Add non-exiting methods and implement SIGHUP reload #405

Are you sure you want to change the base?

Add non-exiting methods and implement SIGHUP reload #405

Conversation

samurai00 commented Sep 30, 2024

eaufavor commented Sep 30, 2024

samurai00 commented Oct 1, 2024

samurai00 commented Oct 8, 2024