You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 25, 2020. It is now read-only.
While Issue #42 is still pending ... just wanted to check whether RingPop currently debars all 'Reject M' messages from a Node N, where Node N itself is in a suspect / Faulty list of all others...?
(Use case would be when partitioned sets, mark nodes in the other partitions as Faulty, and lets assume that the network restores, the Reject messages would pass over to the other partition thereby marking Alive nodes in the correspondingly opposite-partitions).
(Have just come from watching RingPop@Rackspace video & reading SWIM paper ... so pardon if I am missing the elephant in the room)
The text was updated successfully, but these errors were encountered:
@robins Thanks for your question. As is implemented now, faulty members are not pinged. Ringpop would not expect a ping to be sent from such a member. And if one arrived, Ringpop would not do the right thing (by marking the send as alive in it's membership).
The resolution to issue #42 will periodically ping faulty members, likely at a lower-rate than the normal protocol period, and the sender will have to assert its aliveness that way.
Does that answer your question? I was confused by what you meant about 'Reject M' messages. But I tried my best to answer. Let me know!
Thanks @jwolski ... but I believe I couldn't explain myself earlier.
I'd try to elaborate as a worst-case scenario. The issue here isn't as much about whether faulty members are pinged, but more about whether 'requests / messages' (not pings) from faulty members are processed or not...
Lets assume that owing to network issues, 20 nodes got split into two clusters (sets) A (with nodes 1-10) and B (with 11-20). If the network has been disconnected for enough time, all nodes in Set B would be ready to mark Nodes 1-10 (in Set A) as faulty.... and vice-versa. Now just before that announce, if the network came back alive, we're essentially going to have a bloodbath when Nodes in Set A announce that Nodes 11-20 are faulty and vice-versa... If nothing else, we're going to see a huge (unnecessary) drop in alive nodes during such network-reconnects.
As the title suggests, this could be avoided / mitigated if (just like pings) Reject messages are not processed from members currently in the Faulty list.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
While Issue #42 is still pending ... just wanted to check whether RingPop currently debars all 'Reject M' messages from a Node N, where Node N itself is in a suspect / Faulty list of all others...?
(Use case would be when partitioned sets, mark nodes in the other partitions as Faulty, and lets assume that the network restores, the Reject messages would pass over to the other partition thereby marking Alive nodes in the correspondingly opposite-partitions).
(Have just come from watching RingPop@Rackspace video & reading SWIM paper ... so pardon if I am missing the elephant in the room)
The text was updated successfully, but these errors were encountered: