Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define how semantic convention fields should be mapped #2375

Open
felixbarny opened this issue Sep 2, 2024 · 0 comments
Open

Define how semantic convention fields should be mapped #2375

felixbarny opened this issue Sep 2, 2024 · 0 comments

Comments

@felixbarny
Copy link
Member

felixbarny commented Sep 2, 2024

ECS and OpenTelemetry Semantic Conventions are merging, which is great.

So far, we've mostly been working on adding ECS fields to SemConv, and are discussing how to resolve discrepancies between fields with different names but similar semantics.

As also noted in the contribution guidelines , we expect more and more of the schema work to be done in OpenTelemetry Semantic Conventions. However, we haven't yet defined a mechanism where we decide how SemConv fields should be mapped to Elasticsearch. For example, whether a SemConv string field should be mapped as keyword, as *text, or a combination of both (via multi-fields). This is inherently Elasticsearch-specific, so not really appropriate for upstream SemConv.

I think this is what ECS should evolve to: ECS can provide us with a workflow, and with the tooling, to decide how a field that's added to SemConv should be mapped in Elasticsearch.

We should think about how we can streamline that process as much as possible, and build automation to propose a mapping given the field type and name, as defined by SemConv. This could alleviate some of the manual burden to decide on the most appropriate field type in ES and ensure that we can also deal well with data that's not (yet) part of ECS or SemConv. Some options that come to mind for this:

  • Dynamic mapping based on defining consistent naming conventions, similar to what ecs@mappings is doing. See also Making all *.name fields be multi-field #2118.
  • OTLP doesn't define complex types like IP or geo location. This makes it more difficult to choose the most appropriate ES field type. While naming conventions, like *_ip and *.ip can help, there's a risk of both false positives and false negatives. We could discuss adding these types to OTel SemConv, or to add some kind of type hints to the string type.
  • For metrics, rely on OTLP metadata to dynamically map metrics, without having to manually create index templates that define time_series_metric and time_series_dimension, and the type (histogram, long, float, aggregate_metric_double, ...).

To differ between "actual" ECS fields and those coming from SemConv, we could introduce a semconv level, next to core and extended.

This somewhat overlaps with how we should map OTel attributes that are unknown (generic/custom/ad-hoc schema) or well-known but not defined by SemConv. For example, receivers from collector-components or shared processing templates that extract fields from plain-text logs.

There are other related questions about the long-term future of ECS, like whether the name ECS still makes sense if it's purpose is to define ES mappings for SemConv, or what other purposes ECS should serve, such as defining aliasing/conversion between ECS and SemConv, or providing Elastic-specific fields on top of SemConv. But these questions deserve their own and separate discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants