Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect and display execution metadata for ES|QL cross cluster searches #112595

Merged
merged 70 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
6ebc396
Collect and display execution metadata for ES|QL cross cluster searches
quux00 Sep 6, 2024
512fdec
Slight improvements to EsqlExecutionInfo
quux00 Sep 6, 2024
39428fa
Removed changes to EsqlQueryResponse, spending too long getting the E…
quux00 Sep 6, 2024
892cd99
Starting threading EsqlExecutionInfo into PlanExecutor and EsqlSesssi…
quux00 Sep 6, 2024
fb10109
Have the initial swap-in of cluster info into EsqlExecutionInfo in Es…
quux00 Sep 6, 2024
797cc8c
Added EsqlExecutionInfo to IndexResolver. Enrich pathway passes in nu…
quux00 Sep 6, 2024
fa7bbb0
ComputeListener updated to the version that has proper remote/local s…
quux00 Sep 9, 2024
71a33ed
Added new tests to ComputeListenerTests
quux00 Sep 9, 2024
ab347a6
Added ExecutionInfo to Result obj (used in ComputeService/EsqlSession)
quux00 Sep 9, 2024
c39111b
update ExecutionInfo with shard counts in ComputeService.lookupDataNodes
quux00 Sep 9, 2024
1a3a7f8
Migrated CrossClustersQueryIT to new setup format, but can't add exec…
quux00 Sep 9, 2024
544aaeb
Added CountDown to acquireComputeForDataNodes - that allows SUCCESSFU…
quux00 Sep 10, 2024
ca2de85
Fixed failing REST and qa tests to account for the new 'took' time in…
quux00 Sep 10, 2024
f839132
Fixed bug where CountDown in ComputeService can be initialized with 0…
quux00 Sep 10, 2024
3f3139b
More qa and bwc test fixes based on what failed in latest ci build
quux00 Sep 10, 2024
e090437
Next round of qa and bwc test fixes based on what failed in latest ci…
quux00 Sep 11, 2024
b2b2542
Fix failing test in EsqlSecurityIT
quux00 Sep 11, 2024
5e7876e
Added _cluster/details to the EsqlQueryResponse XContent for cross-cl…
quux00 Sep 11, 2024
3e16fbb
Fixed test failure in esql/ccq/MultiClustersIT
quux00 Sep 11, 2024
ed5b9db
Updated end user docs with info about top level took time and _cluste…
quux00 Sep 11, 2024
661a243
Added EsqlExecutionInfo to equals and hashCode method of EsqlQueryRes…
quux00 Sep 12, 2024
9535bdd
Removed skip_unavilable=true filter in IndexResolver - all clusters a…
quux00 Sep 12, 2024
699b16a
Moved isRemoteUnavailableException to ExceptionsHelper
quux00 Sep 12, 2024
228eed2
Added equals and hashCode to EsqlExecutionInfo.Cluster object.
quux00 Sep 13, 2024
fa9c7c4
Minor tweak to esql-across-clusters.asciidoc
quux00 Sep 13, 2024
c51719e
Improvements to esql-across-clusters.asciidoc
quux00 Sep 13, 2024
d365c37
Update docs/changelog/112595.yaml
quux00 Sep 13, 2024
8688dbf
Added questions about took time headers to EsqlResponseListener - pos…
quux00 Sep 13, 2024
5b93774
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 16, 2024
449e1a7
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 17, 2024
0083ae7
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 18, 2024
5462d6b
PR feedback with focus on end user docs fixes, removing some out-of-d…
quux00 Sep 18, 2024
5f27325
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 18, 2024
9e77c28
Additional PR feedback changes - test adjustments, remove 'set' and '…
quux00 Sep 19, 2024
fd6d3bf
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 19, 2024
6e87174
Now tracking took in nanos, not millis (but XContent still displays i…
quux00 Sep 19, 2024
b474fc7
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 19, 2024
1649962
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 19, 2024
20c9356
Changed ComputeResponse to de/serialize with read/writeOptionalTimeValue
quux00 Sep 19, 2024
e6aa92a
EsqlResponseListener now preferentially uses the took time in the Esq…
quux00 Sep 19, 2024
59d1480
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 20, 2024
8bb1b7f
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 20, 2024
6323cdf
Modified esql-across-clusters to run the new queries I added; but JSO…
quux00 Sep 20, 2024
4875a66
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 20, 2024
24e0c02
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 20, 2024
940ef22
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 20, 2024
617cbec
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 23, 2024
c45d181
Removed code that lists fully resolved indices in the _clusters/detai…
quux00 Sep 23, 2024
13a34de
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 23, 2024
9ac5746
Code cleanup - remove commented out code in IndexResolverTests
quux00 Sep 23, 2024
3afb7a1
PR feedback: Moved logic for unavailable/missing clusters to EsqlSession
quux00 Sep 23, 2024
b118406
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 24, 2024
fc53eb7
PR feedback: I removed acquireCCSCompute and acquireComputeForDatanod…
quux00 Sep 25, 2024
d50658c
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 26, 2024
dc467ac
PR feedback: Created new intf IndicesExpressionResolver and have Remo…
quux00 Sep 26, 2024
711c1f8
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 26, 2024
8e5f170
checkstyle fix
quux00 Sep 27, 2024
838e6a9
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 27, 2024
77cc107
Moved parseClusterAlias from IndexResolver to RemoteClusterAware and …
quux00 Sep 27, 2024
0e71453
Renamed IndicesExpressionResolver intf to IndicesExpressionGrouper.
quux00 Sep 27, 2024
a69f3db
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 27, 2024
a7efbec
PR feedback: Added javadoc to ComputeListener, removed leftover debug…
quux00 Sep 27, 2024
aa8bbaa
Fixed bug where SKIPPED status for unavailable clusters from field-ca…
quux00 Sep 27, 2024
e5e45b5
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 27, 2024
9a304c2
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 30, 2024
d79af98
PR feedback
quux00 Sep 30, 2024
826aab7
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 30, 2024
ec99687
Changed status to SKIPPED when no matching index found for remote clu…
quux00 Sep 30, 2024
02092e9
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 30, 2024
8569bfa
Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…
quux00 Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .idea/inspectionProfiles/Project_Default.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions docs/changelog/112595.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 112595
summary: Collect and display execution metadata for ES|QL cross cluster searches
area: ES|QL
type: enhancement
issues:
- 112402
219 changes: 208 additions & 11 deletions docs/reference/esql/esql-across-clusters.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ POST /_security/role/remote1
"privileges": [ "read","read_cross_cluster" ], <4>
"clusters" : ["my_remote_cluster"] <5>
}
],
],
"remote_cluster": [ <6>
{
"privileges": [
Expand All @@ -100,15 +100,23 @@ POST /_security/role/remote1
----

<1> The `cross_cluster_search` cluster privilege is required for the _local_ cluster.
<2> Typically, users will have permissions to read both local and remote indices. However, for cases where the role is intended to ONLY search the remote cluster, the `read` permission is still required for the local cluster. To provide read access to the local cluster, but disallow reading any indices in the local cluster, the `names` field may be an empty string.
<3> The indices allowed read access to the remote cluster. The configured <<security-api-create-cross-cluster-api-key,cross-cluster API key>> must also allow this index to be read.
<4> The `read_cross_cluster` privilege is always required when using {esql} across clusters with the API key based security model.
<2> Typically, users will have permissions to read both local and remote indices. However, for cases where the role
is intended to ONLY search the remote cluster, the `read` permission is still required for the local cluster.
To provide read access to the local cluster, but disallow reading any indices in the local cluster, the `names`
field may be an empty string.
<3> The indices allowed read access to the remote cluster. The configured
<<security-api-create-cross-cluster-api-key,cross-cluster API key>> must also allow this index to be read.
<4> The `read_cross_cluster` privilege is always required when using {esql} across clusters with the API key based
security model.
<5> The remote clusters to which these privileges apply.
This remote cluster must be configured with a <<security-api-create-cross-cluster-api-key,cross-cluster API key>> and connected to the remote cluster before the remote index can be queried.
This remote cluster must be configured with a <<security-api-create-cross-cluster-api-key,cross-cluster API key>>
and connected to the remote cluster before the remote index can be queried.
Verify connection using the <<cluster-remote-info, Remote cluster info>> API.
<6> Required to allow remote enrichment. Without this, the user cannot read from the `.enrich` indices on the remote cluster. The `remote_cluster` security privilege was introduced in version *8.15.0*.
<6> Required to allow remote enrichment. Without this, the user cannot read from the `.enrich` indices on the
remote cluster. The `remote_cluster` security privilege was introduced in version *8.15.0*.

You will then need a user or API key with the permissions you created above. The following example API call creates a user with the `remote1` role.
You will then need a user or API key with the permissions you created above. The following example API call creates
a user with the `remote1` role.

[source,console]
----
Expand All @@ -119,11 +127,13 @@ POST /_security/user/remote_user
}
----

Remember that all cross-cluster requests from the local cluster are bound by the cross cluster API key’s privileges, which are controlled by the remote cluster's administrator.
Remember that all cross-cluster requests from the local cluster are bound by the cross cluster API key’s privileges,
which are controlled by the remote cluster's administrator.

[TIP]
====
Cross cluster API keys created in versions prior to 8.15.0 will need to replaced or updated to add the new permissions required for {esql} with ENRICH.
Cross cluster API keys created in versions prior to 8.15.0 will need to replaced or updated to add the new permissions
required for {esql} with ENRICH.
====

[discrete]
Expand Down Expand Up @@ -174,6 +184,194 @@ FROM *:my-index-000001
| LIMIT 10
----

[discrete]
[[ccq-cluster-details]]
==== Cross-cluster metadata

ES|QL {ccs} responses include metadata about the search on each cluster when the response format is JSON.
Here we show an example using the async search endpoint. {ccs-cap} metadata is also present in the synchronous
search endpoint.

[source,console]
----
POST /_query/async?format=json
{
"query": """
FROM my-index-000001,cluster_one:my-index-000001,cluster_two:my-index*
| STATS COUNT(http.response.status_code) BY user.id
| LIMIT 2
"""
}
----
// TEST[setup:my_index]
// TEST[s/cluster_one:my-index-000001,cluster_two:my-index//]

Which returns:

[source,console-result]
----
{
"is_running": false,
"took": 42, <1>
"columns" : [
{
"name" : "COUNT(http.response.status_code)",
"type" : "long"
},
{
"name" : "user.id",
"type" : "keyword"
}
],
"values" : [
[4, "elkbee"],
[1, "kimchy"]
],
"_clusters": { <2>
"total": 3,
"successful": 3,
"running": 0,
"skipped": 0,
"partial": 0,
"failed": 0,
"details": { <3>
"(local)": { <4>
"status": "successful",
"indices": "blogs",
"took": 36, <5>
"_shards": { <6>
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
}
},
"cluster_one": {
"status": "successful",
"indices": "cluster_one:my-index-000001",
"took": 38,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
}
},
"cluster_two": {
"status": "successful",
"indices": "cluster_two:my-index-000001", <7>
"took": 41,
"_shards": {
"total": 18,
"successful": 18,
"skipped": 1,
"failed": 0
}
}
}
}
}
----
// TEST[skip: cross-cluster testing env not set up]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local cluster should be available, right? Could we remove the multi-cluster output so we get the assertion that the shape is pretty close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent several hours trying but unless you know of a trick to do clever multi-line matching I don't see how this is possible. Among the things I tried was adding "m" to the end of the matcher to indicate multi-line matching (as in Perl matching), but that doesn't work. Mostly I just get failed runs with no information as to what is wrong.

Plus I'm not really sure it's worth it? The whole point of this section is to show the _clusters/details section so testing against a non-CCS set up doesn't seem useful.

We probably need another ticket to enable the multi-cluster testing setup that search-across-clusters.asciidoc uses, as that was not set up for this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


<1> How long the entire search (across all clusters) took, in milliseconds.
<2> This section of counters shows all possible cluster search states and how many cluster
searches are currently in that state. The clusters can have one of the following statuses: *running*,
*successful* (searches on all shards were successful), *skipped* (the search
failed on a cluster marked with `skip_unavailable`=`true`) or *failed* (the search
failed on a cluster marked with `skip_unavailable`=`false`).
<3> The `_clusters/details` section shows metadata about the search on each cluster.
<4> If you included indices from the local cluster you sent the request to in your {ccs},
it is identified as "(local)".
<5> How long (in milliseconds) the search took on each cluster. This can be useful to determine
which clusters have slower response times than others.
<6> The shard details for the search on that cluster, including a count of shards that were
skipped due to the can-match phase indicating it had no matching data so it did not need
to be included in the full ES|QL query.
<7> The index expression supplied by the user. If you provide a wildcard such as `my-index*`,
this section will show the resolved index name(s) here, unless no matching indices could
be found on that cluster, in which case the wildcard expression will be retained here.


The cross-cluster metadata can be used to determine whether any data came back from a cluster.
For instance in the query below, you see that wildcard expression for `cluster-two` did not
resolve to a concrete index (or indices) and that the total number of shards searched is
zero. This indicates that no matching index was found on that cluster. But since the other
cluster did have a matching index, the search did not return an error, but instead
returned all the matching data it could find.


[source,console]
----
POST /_query/async?format=json
{
"query": """
FROM cluster_one:my-index*,cluster_two:logs*
| STATS COUNT(http.response.status_code) BY user.id
| LIMIT 2
"""
}
----
// TEST[continued]
// TEST[s/cluster_one:my-index\*,cluster_two:logs\*/my-index-000001/]

Which returns:

[source,console-result]
----
{
"is_running": false,
"took": 55,
"columns": [
... // not shown
],
"values": [
... // not shown
],
"_clusters": {
"total": 2,
"successful": 2,
"running": 0,
"skipped": 0,
"partial": 0,
"failed": 0,
"details": {
"cluster_one": {
"status": "successful",
"indices": "cluster_one:my-index-000001",
"took": 38,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
}
},
"cluster_two": {
"status": "successful", <1>
"indices": "cluster_two:logs*", <2>
"took": 0,
"_shards": {
"total": 0, <3>
"successful": 0,
"skipped": 0,
"failed": 0
}
}
}
}
}
----
// TEST[skip: cross-cluster testing env not set up]

<1> This search is still marked as successful, even though no data was searched.
<2> Since there were no matching indices for the wildcard pattern provided, the original
index expression provided by the user is retained here.
<3> Indicates that no shards were searched (due to not having any matching indices).




Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naj-h and @tylerperk - Please review proposed end user docs changes.

[discrete]
[[ccq-enrich]]
==== Enrich across clusters
Expand Down Expand Up @@ -331,8 +529,7 @@ setting. As a result, if a remote cluster specified in the request is
unavailable or failed, {ccs} for {esql} queries will fail regardless of the setting.

We are actively working to align the behavior of {ccs} for {esql} with other
{ccs} APIs. This includes providing detailed execution information for each cluster
in the response, such as execution time, selected target indices, and shards.
{ccs} APIs.

[discrete]
[[ccq-during-upgrade]]
Expand Down
5 changes: 4 additions & 1 deletion docs/reference/esql/esql-rest.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,7 @@ Which returns:
[source,console-result]
----
{
"took": 28,
"columns": [
{"name": "author", "type": "text"},
{"name": "name", "type": "text"},
Expand All @@ -206,6 +207,7 @@ Which returns:
]
}
----
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]

[discrete]
[[esql-locale-param]]
Expand Down Expand Up @@ -384,12 +386,13 @@ GET /_query/async/FmNJRUZ1YWZCU3dHY1BIOUhaenVSRkEaaXFlZ3h4c1RTWFNocDdnY2FSaERnUT
// TEST[skip: no access to query ID - may return response values]

If the response's `is_running` value is `false`, the query has finished
and the results are returned.
and the results are returned, along with the `took` time for the query.

[source,console-result]
----
{
"is_running": false,
"took": 48,
"columns": ...
}
----
Expand Down
Loading