Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support long marked offset in SlicedInputStream #113629

Merged
merged 12 commits into from
Sep 27, 2024

Conversation

kingherc
Copy link
Contributor

Else, a reset might reset to an unintended offset in a big slice.

Relates ES-9639

Else, a reset might reset to an unintended offset in a big slice.

Relates ES-9639
@kingherc kingherc added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team (obsolete) auto-backport-and-merge v8.16.0 v9.0.0 labels Sep 26, 2024
@kingherc kingherc self-assigned this Sep 26, 2024
@kingherc kingherc marked this pull request as ready for review September 26, 2024 17:46
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks good, I left some suggestions regarding the test.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left a couple comments in addition to David's.

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Sep 27, 2024
Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @DaveCTurner , @henningandersen ! Handled, feel free to review again.

Comment on lines 170 to 173
if (currentStream == null) {
// In case EOF has been reached, we set the currentStream to a non-null value so that nextStream() does not complain.
currentStream = InputStream.nullInputStream();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to do this dance just to make nextStream() happy? It seems as if we could just set currentStream to the right slice directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment made me thinking and found another non-intrusive way: extend the assertion in nextStream(). Feel free to review!

}
final long skipped = stream.skip(remaining);
currentSliceOffset += skipped;
if (skipped == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm skipped == 0 doesn't necessarily mean we reached end-of-stream. If we haven't quite skipped enough bytes we should try and read() one more byte and only move to the next stream if that returns -1.

I think we also shouldn't keep retrying the skip() if it falls short (unless it reached the end of the stream). It's up to the caller to check the return value and retry if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point! Added a read() to ensure we're reaching stream EOF.

I think we also shouldn't keep retrying the skip() if it falls short (unless it reached the end of the stream). It's up to the caller to check the return value and retry if needed.

I'm split on this. Certainly the caller should retry, but I also see implementations with loops. E.g., the default JDK implementation has a loop over read(), instead of a single call to read(). I also feel it's "safer" in case the caller forgets the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in practice most impls of skip do try and skip all the way, just seems like if one doesn't then it might be useful to propagate that to the caller.

I don't think we want to check skipped == 0 at all tbh - that involves an extra iteration at the end of each stream. Can we just check for EOF if skipped < remaining?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, totally makes it better, done!

Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DaveCTurner ! Feel free to review again.

}
final long skipped = stream.skip(remaining);
currentSliceOffset += skipped;
if (skipped == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point! Added a read() to ensure we're reaching stream EOF.

I think we also shouldn't keep retrying the skip() if it falls short (unless it reached the end of the stream). It's up to the caller to check the return value and retry if needed.

I'm split on this. Certainly the caller should retry, but I also see implementations with loops. E.g., the default JDK implementation has a loop over read(), instead of a single call to read(). I also feel it's "safer" in case the caller forgets the loop.

Comment on lines 170 to 173
if (currentStream == null) {
// In case EOF has been reached, we set the currentStream to a non-null value so that nextStream() does not complain.
currentStream = InputStream.nullInputStream();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment made me thinking and found another non-intrusive way: extend the assertion in nextStream(). Feel free to review!

}
final long skipped = stream.skip(remaining);
currentSliceOffset += skipped;
if (skipped == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in practice most impls of skip do try and skip all the way, just seems like if one doesn't then it might be useful to propagate that to the caller.

I don't think we want to check skipped == 0 at all tbh - that involves an extra iteration at the end of each stream. Can we just check for EOF if skipped < remaining?

Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DaveCTurner ! Feel free to review again.

}
final long skipped = stream.skip(remaining);
currentSliceOffset += skipped;
if (skipped == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, totally makes it better, done!

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few more comments (besides David's).

Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @henningandersen , @DaveCTurner ! Feel free to review again.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (one question but it doesn't especially matter either way)

// the skip is performed on the marked slice and no other slices are involved. This may help uncover any bugs.
currentStream.skipNBytes(markedSliceOffset);
} else {
currentStream = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this else branch happen? Isn't markedSlice < numSlices an invariant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can happen in the case one marks after reaching EOF. Added a test and made sure it works.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kingherc
Copy link
Contributor Author

CI failure is #113694 , retrying.

@kingherc kingherc merged commit dfa13ba into elastic:main Sep 27, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

kingherc added a commit to kingherc/elasticsearch that referenced this pull request Sep 27, 2024
Else, a reset might reset to an unintended offset in a big slice.

Also add support for skip.

Relates ES-9639
elasticsearchmachine pushed a commit that referenced this pull request Sep 27, 2024
Else, a reset might reset to an unintended offset in a big slice.

Also add support for skip.

Relates ES-9639
matthewabbott pushed a commit to matthewabbott/elasticsearch that referenced this pull request Oct 4, 2024
Else, a reset might reset to an unintended offset in a big slice.

Also add support for skip.

Relates ES-9639
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Meta label for distributed team (obsolete) v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants