Define how semantic convention fields should be mapped #2375

felixbarny · 2024-09-02T16:07:11Z

ECS and OpenTelemetry Semantic Conventions are merging, which is great.

So far, we've mostly been working on adding ECS fields to SemConv, and are discussing how to resolve discrepancies between fields with different names but similar semantics.

As also noted in the contribution guidelines , we expect more and more of the schema work to be done in OpenTelemetry Semantic Conventions. However, we haven't yet defined a mechanism where we decide how SemConv fields should be mapped to Elasticsearch. For example, whether a SemConv string field should be mapped as keyword, as *text, or a combination of both (via multi-fields). This is inherently Elasticsearch-specific, so not really appropriate for upstream SemConv.

I think this is what ECS should evolve to: ECS can provide us with a workflow, and with the tooling, to decide how a field that's added to SemConv should be mapped in Elasticsearch.

We should think about how we can streamline that process as much as possible, and build automation to propose a mapping given the field type and name, as defined by SemConv. This could alleviate some of the manual burden to decide on the most appropriate field type in ES and ensure that we can also deal well with data that's not (yet) part of ECS or SemConv. Some options that come to mind for this:

Dynamic mapping based on defining consistent naming conventions, similar to what ecs@mappings is doing. See also Making all *.name fields be multi-field #2118.
OTLP doesn't define complex types like IP or geo location. This makes it more difficult to choose the most appropriate ES field type. While naming conventions, like *_ip and *.ip can help, there's a risk of both false positives and false negatives. We could discuss adding these types to OTel SemConv, or to add some kind of type hints to the string type.
For metrics, rely on OTLP metadata to dynamically map metrics, without having to manually create index templates that define time_series_metric and time_series_dimension, and the type (histogram, long, float, aggregate_metric_double, ...).

To differ between "actual" ECS fields and those coming from SemConv, we could introduce a semconv level, next to core and extended.

This somewhat overlaps with how we should map OTel attributes that are unknown (generic/custom/ad-hoc schema) or well-known but not defined by SemConv. For example, receivers from collector-components or shared processing templates that extract fields from plain-text logs.

There are other related questions about the long-term future of ECS, like whether the name ECS still makes sense if it's purpose is to define ES mappings for SemConv, or what other purposes ECS should serve, such as defining aliasing/conversion between ECS and SemConv, or providing Elastic-specific fields on top of SemConv. But these questions deserve their own and separate discussion.

The text was updated successfully, but these errors were encountered:

felixbarny added the discuss label Sep 2, 2024

AlexanderWert assigned felixbarny Sep 9, 2024

AlexanderWert added the deliverable label Sep 9, 2024

felixbarny mentioned this issue Sep 23, 2024

Extend ecs@mappings to handle type coercion elastic/elasticsearch#113124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define how semantic convention fields should be mapped #2375

Define how semantic convention fields should be mapped #2375

felixbarny commented Sep 2, 2024 •

edited

Loading

Define how semantic convention fields should be mapped #2375

Define how semantic convention fields should be mapped #2375

Comments

felixbarny commented Sep 2, 2024 • edited Loading

felixbarny commented Sep 2, 2024 •

edited

Loading