-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add link to MAX_RETRY allocation explain message #113657
Add link to MAX_RETRY allocation explain message #113657
Conversation
Documentation preview: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments. Also you need to run ./gradlew precommit
and fix up the issues.
If no other `no` decisions are present, then the transient allocation issue | ||
that caused these failures has most likely been resolved, and you can use the | ||
<<cluster-reroute,the cluster reroute API>> to retry allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be confusing, there are normally always some no
decisions e.g. for nodes in the wrong data tier.
Also I'd rather we used the imperative voice: "use the reroute API" rather than just suggesting "you can ...".
Finally there's a duplicate the
(one inside the link and one outside).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brainstorming, I might say
Elasticsearch queues shard allocation retries in batches. If there are long running or a high quantity of shard recoveries occurring within the cluster, this process may time out for some shards resulting in
MAX_RETRY
. This surfaces infrequently but is expected to prevent infinite retries which may impact cluster performance. When encountered, run <<cluster-reroute,the cluster reroute API>> to retry allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to
Elasticsearch queues shard allocation retries in batches. If there are long-running shard
recoveries or a high quantity of shard recoveries occurring within the cluster, this
process may time out for some shards, resulting inmax_retry
. This surfaces infrequently
but is expected to prevent infinite retries which may impact cluster performance. When
encountered, run the <<cluster-reroute,cluster reroute>> API to retry allocation.
Which is basically identical but I tweaked the wording on the second sentence because I thought it sounded a bit clearer that way, and also moved 'the' and 'API' out of the link per suggestion from @DaveCTurner
Thanks!
server/src/main/resources/org/elasticsearch/common/reference-docs-links.json
Outdated
Show resolved
Hide resolved
Yikes about all those commits. I did not rebase this correctly. |
17f738f
to
74905ea
Compare
…ation_max_retries_doc
Pinging @elastic/es-docs (Team:Docs) |
Pinging @elastic/es-distributed (Team:Distributed) |
the <<cluster-reroute,cluster reroute>> API to retry allocation. | ||
Elasticsearch queues shard allocation retries in batches. If there are long-running shard | ||
recoveries or a high quantity of shard recoveries occurring within the cluster, this | ||
process may time out for some shards, resulting in `max_retry`. This surfaces infrequently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true, there's no timeout in play here. You need to get 5 genuine failures in a row before you see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of changing this to
When Elasticsearch is unable to allocate a shard, it will attempt to retry allocation up to the
maximum number of retries allowed. After this, Elasticsearch will stop attempting to allocate
the shard in order to prevent infinite retries which may impact cluster performance. Run the
<<cluster-reroute,cluster reroute>> API to retry allocation, which will allocate the shard if the
issue preventing allocation has been resolved.
Are there any tweaks you’d like to make?/Does that seem reasonable?
...n/generated/org/elasticsearch/xpack/esql/expression/function/scalar/math/HypotEvaluator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Backport of elastic#113657 to `8.x`
Backport of #113657 to `8.x` Co-authored-by: matthewabbott <[email protected]>
Adds maximum number of retries exceeded reference link to the max_retry allocation explanations string.
Adds more detail to documentation page describing that this was done to protect the cluster, but the real cause of the issue may now be gone and so allocation can be retried.
Also adds
POST
to the example_cluster/reroute
API in the explanation because some customers would useGET
and be confused why it didn’t work.