Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition when multiple nodes reconcile S3 snapshots #10979

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

brandond
Copy link
Member

@brandond brandond commented Oct 2, 2024

Proposed Changes

Fix race condition when multiple nodes reconcile S3 snapshots

Don't delete s3 etcdsnapshotfiles if they are missing from s3 but less than a minute old, its possible the other node just finished uploading it and the object key has not yet become visible.

This is most easily reproduced when an on-demand etcd snapshot is requested via Rancher, as it triggers simultaneous snapshots on all etcd nodes, and they will all try to reconcile s3 objects against the etcdsnapshotfile list at the same time despite not necessarily having a consistent view of the bucket.

Types of Changes

bugfix

Verification

See linked issue

Testing

Linked Issues

User-Facing Change


Further Comments

Copy link

codecov bot commented Oct 2, 2024

Codecov Report

Attention: Patch coverage is 28.00000% with 18 lines in your changes missing coverage. Please review.

Project coverage is 43.78%. Comparing base (0942e6a) to head (12020bc).

Files with missing lines Patch % Lines
pkg/etcd/snapshot.go 13.33% 13 Missing ⚠️
pkg/etcd/s3/s3.go 50.00% 2 Missing and 3 partials ⚠️

❗ There is a different number of reports uploaded between BASE (0942e6a) and HEAD (12020bc). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (0942e6a) HEAD (12020bc)
e2etests 7 6
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10979      +/-   ##
==========================================
- Coverage   49.95%   43.78%   -6.18%     
==========================================
  Files         178      178              
  Lines       14801    14812      +11     
==========================================
- Hits         7394     6485     -909     
- Misses       6056     7126    +1070     
+ Partials     1351     1201     -150     
Flag Coverage Δ
e2etests 36.12% <0.00%> (-10.05%) ⬇️
inttests 36.73% <8.00%> (-0.03%) ⬇️
unittests 13.52% <20.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@brandond brandond force-pushed the fix-s3-etcdsnapshotfile-race branch from 81276a4 to e5181f3 Compare October 3, 2024 20:16
Don't delete s3 etcdsnapshotfiles if they are missing from s3 but less than a minute old, its possible the other node just finished uploading it and the object key has not yet become visible.

Signed-off-by: Brad Davidson <[email protected]>
@brandond brandond force-pushed the fix-s3-etcdsnapshotfile-race branch from e5181f3 to 12020bc Compare October 3, 2024 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants