Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix checking distance of last snapshot #539

Merged
merged 2 commits into from
Oct 12, 2024

Conversation

antonio2368
Copy link
Contributor

In one of our tests, NuRaft ended up creating 2 snapshots with a distance of 1 log.

I didn't find anything strange in the logs:

2024.09.28 03:34:06.424741 [ 606 ] {} <Debug> RaftInstance: commit upto 50, current idx 49
2024.09.28 03:34:06.424773 [ 606 ] {} <Trace> RaftInstance: commit upto 50, current idx 50
2024.09.28 03:34:06.424871 [ 606 ] {} <Information> RaftInstance: creating a snapshot for index 50
2024.09.28 03:34:06.424891 [ 606 ] {} <Information> RaftInstance: create snapshot idx 50 log_term 1
2024.09.28 03:34:06.424897 [ 605 ] {} <Debug> RaftInstance: Receive a append_entries_response message from peer 3 with Result=1, Term=1, NextIndex=51
2024.09.28 03:34:06.425021 [ 596 ] {} <Debug> RaftInstance: Receive a append_entries_response message from peer 2 with Result=1, Term=1, NextIndex=51
2024.09.28 03:34:06.425072 [ 596 ] {} <Trace> RaftInstance: peer 2, prev matched idx: 49, new matched idx: 50
2024.09.28 03:34:06.425104 [ 596 ] {} <Debug> RaftInstance: reqeust append entries need to catchup, p 2
2024.09.28 03:34:06.425130 [ 596 ] {} <Debug> RaftInstance: append_entries for 2 with LastLogIndex=50, LastLogTerm=1, EntriesLength=0, CommitIndex=50, Term=1, peer_last_sent_idx 50
2024.09.28 03:34:06.425140 [ 606 ] {} <Information> RaftInstance: create snapshot idx 50 log_term 1 done: 221 us elapsed
2024.09.28 03:34:06.425165 [ 606 ] {} <Debug> RaftInstance: DONE: commit upto 50, current idx 50
2024.09.28 03:34:06.425520 [ 592 ] {} <Debug> RaftInstance: Receive a append_entries_response message from peer 2 with Result=1, Term=1, NextIndex=51
2024.09.28 03:34:06.425991 [ 583 ] {} <Debug> RaftInstance: Receive a client_request message from 0 with LastLogIndex=0, LastLogTerm 0, EntriesLength=1, CommitIndex=0 and Term=0
2024.09.28 03:34:06.426080 [ 583 ] {} <Debug> RaftInstance: append at log_idx 51, timestamp 1727494446426011
2024.09.28 03:34:06.426120 [ 583 ] {} <Debug> RaftInstance: asynchronously replicated 51, return value 0x503000827260
2024.09.28 03:34:06.426310 [ 607 ] {} <Debug> RaftInstance: append_entries for 3 with LastLogIndex=50, LastLogTerm=1, EntriesLength=1, CommitIndex=50, Term=1, peer_last_sent_idx 51
2024.09.28 03:34:06.426435 [ 607 ] {} <Debug> RaftInstance: append_entries for 2 with LastLogIndex=50, LastLogTerm=1, EntriesLength=1, CommitIndex=50, Term=1, peer_last_sent_idx 51
2024.09.28 03:34:06.429853 [ 601 ] {} <Debug> RaftInstance: Receive a append_entries_response message from peer 2 with Result=1, Term=1, NextIndex=52
2024.09.28 03:34:06.429906 [ 601 ] {} <Trace> RaftInstance: peer 2, prev matched idx: 50, new matched idx: 51
2024.09.28 03:34:06.429984 [ 601 ] {} <Debug> RaftInstance: append_entries for 3 with LastLogIndex=50, LastLogTerm=1, EntriesLength=1, CommitIndex=51, Term=1, peer_last_sent_idx 51
2024.09.28 03:34:06.430093 [ 601 ] {} <Debug> RaftInstance: append_entries for 2 with LastLogIndex=51, LastLogTerm=1, EntriesLength=0, CommitIndex=51, Term=1, peer_last_sent_idx 51
2024.09.28 03:34:06.430190 [ 606 ] {} <Debug> RaftInstance: commit upto 51, current idx 50
2024.09.28 03:34:06.430207 [ 604 ] {} <Debug> RaftInstance: Receive a append_entries_response message from peer 3 with Result=1, Term=1, NextIndex=52
2024.09.28 03:34:06.430225 [ 606 ] {} <Trace> RaftInstance: commit upto 51, current idx 51
2024.09.28 03:34:06.430262 [ 604 ] {} <Trace> RaftInstance: peer 3, prev matched idx: 50, new matched idx: 51
2024.09.28 03:34:06.430324 [ 582 ] {} <Information> RaftInstance: snapshot idx 50 log_term 1 created, compact the log store if needed
2024.09.28 03:34:06.430391 [ 591 ] {} <Debug> RaftInstance: Receive a append_entries_response message from peer 2 with Result=1, Term=1, NextIndex=52
2024.09.28 03:34:06.430444 [ 606 ] {} <Information> RaftInstance: creating a snapshot for index 51
2024.09.28 03:34:06.430465 [ 606 ] {} <Information> RaftInstance: create snapshot idx 51 log_term 1
2024.09.28 03:34:06.430502 [ 590 ] {} <Debug> RaftInstance: Receive a append_entries_response message from peer 3 with Result=1, Term=1, NextIndex=52
2024.09.28 03:34:06.430576 [ 606 ] {} <Information> RaftInstance: create snapshot idx 51 log_term 1 done: 86 us elapsed
2024.09.28 03:34:06.430596 [ 606 ] {} <Debug> RaftInstance: DONE: commit upto 51, current idx 51
2024.09.28 03:34:06.431433 [ 583 ] {} <Debug> RaftInstance: Receive a client_request message from 0 with LastLogIndex=0, LastLogTerm 0, EntriesLength=1, CommitIndex=0 and Term=0
2024.09.28 03:34:06.431547 [ 583 ] {} <Debug> RaftInstance: append at log_idx 52, timestamp 1727494446431453
2024.09.28 03:34:06.431598 [ 583 ] {} <Debug> RaftInstance: asynchronously replicated 52, return value 0x5030008273e0
2024.09.28 03:34:06.431686 [ 607 ] {} <Debug> RaftInstance: append_entries for 3 with LastLogIndex=51, LastLogTerm=1, EntriesLength=1, CommitIndex=51, Term=1, peer_last_sent_idx 51
2024.09.28 03:34:06.431817 [ 607 ] {} <Debug> RaftInstance: append_entries for 2 with LastLogIndex=51, LastLogTerm=1, EntriesLength=1, CommitIndex=51, Term=1, peer_last_sent_idx 52
2024.09.28 03:34:06.435829 [ 582 ] {} <Information> RaftInstance: snapshot idx 51 log_term 1 created, compact the log store if needed

The only explanation I see is a small race. If snp_in_progress_ is true, local_snp can change while we run checks before starting snapshot creation.
We need to fetch local_snp again after we are sure we are the only possible thread that is creating a snapshot.

Copy link
Contributor

@greensky00 greensky00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes that explains the issue and the fix looks good to me.
I just did a line breaking, and will merge it as soon as tests pass.

@greensky00 greensky00 merged commit 1106071 into eBay:master Oct 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants