-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(collector): actually enable the deployment of PrometheusRule when using PodMonitor #1416
base: main
Are you sure you want to change the base?
fix(collector): actually enable the deployment of PrometheusRule when using PodMonitor #1416
Conversation
…PodMonitor PodMonitor was not being deployed because this logic was not in place. open-telemetry#1379
…loyed this fixes a regression introduced by myself in my last commit open-telemetry#1379
this fixes a regression i introduced related to rendering the PrometheusRule conditions. it should now render in a wide array of scenarios. open-telemetry#1379
i didn't realize at first, but i think i introduced a regression initially in #1415. i think testing should be expanded to validate that the output matches the input of values for these specific scenarios, but is somewhat obfuscated by the helper here which can obfuscate user input a little, but i understand why it exists. |
@@ -1,4 +1,4 @@ | |||
{{- if and .Values.prometheusRule.enabled .Values.serviceMonitor.enabled }} | |||
{{- if and .Values.prometheusRule.enabled (or (eq (include "opentelemetry-collector.serviceEnabled" .) "true") (or .Values.serviceMonitor.enabled .Values.podMonitor.enabled)) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user wants to render the PrometheusRule, they should be able to so long as they have enabled ServiceMonitor or PodMonitor, and in the case of the ServiceMonitor they have set Values.service.enabled == true
@crutonjohn this sentence reads as if the condition should be
if (or .Values.podMonitor.enabled (and .Values.serviceMonitor.enabled (eq (include "opentelemetry-collector.serviceEnabled" .) "true")))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in hindsight i probably shouldn't have tried to gotmpl that late in the day 😅
i think i mostly have it figured out:
{{- if and .Values.prometheusRule.enabled (or (eq (include "opentelemetry-collector.serviceEnabled" .) "true") (or .Values.serviceMonitor.enabled .Values.podMonitor.enabled)) }} | |
{{- if and .Values.prometheusRule.enabled (or .Values.podMonitor.enabled (and (eq (include "opentelemetry-collector.serviceEnabled" .) "true") .Values.serviceMonitor.enabled)) }} |
however, with this logic a user could use these values:
mode: "deployment"
image:
repository: "otel/opentelemetry-collector-k8s"
prometheusRule:
enabled: true
ports:
metrics:
enabled: true
podMonitor:
enabled: false
serviceMonitor:
enabled: false
service:
enabled: false
and believe they are going to get a PrometheusRule, but instead get no PrometheusRule/ServiceMonitor/PodMonitor because neither service nor pod monitors are enabled.
would it be considered acceptable UX to prevent the user from getting a PrometheusRule CR when they explicitly say they want one? or should we remain somewhat opinionated in that you should only be deploying a PrometheusRule CR when you have a Pod/ServiceMonitor getting deployed in tandem with said PrometheusRule?
happy to accommodate either way, i just don't know what's best for the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm thats a good point.
Is there any harm in creating a prometheusRule even if pod/service monitor don't exist? What will happen if something tries to use the Prometheus rule but the monitors dont exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd have to do some verification, but iirc a PrometheusRule is just what it sounds like. it would effectively create the rule(s) contained within the file. there is some nuance with how prometheus operator is deployed that can affect how the PrometheusRule can "land" on the cluster...for example if you are running multiple prometheus instances in the same cluster scraping different targets. the principle is still the same.
the rules in the collector chart are generic enough (read: have low granularity due to promql label selector omission) that there could be a scenario wherein a metric like otelcol_exporter_send_failed_log_records exists in the TSDB from a source that isn't our deployed collector (or the user has this chart deployed multiple times for different situations), users could hypothetically be getting alerts for something unrelated to our collector deployment.
this would be kind of a "perfect storm" type scenario to be fair, but definitely possible. i've seen some pretty funky prometheus operator setups before. i think the crux of the issue is that there are tons of potential deployment scenarios that we couldn't possibly account for every one with the logic within the chart. part of the responsibility is on the end user being familiar with what the chart's behavior enough to know "hey i don't need to enable the PrometheusRule if i'm not going to be using the Service/PodMonitor". could add some docs somewhere, idk where the best place would be tho.
sorry, not trying to be a headache.
@@ -1,6 +1,6 @@ | |||
apiVersion: v2 | |||
name: opentelemetry-collector | |||
version: 0.109.0 | |||
version: 0.109.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run make generate-examples CHARTS=opentelemetry-collector
to generate the examples.
When fixing this logic regressions were introduced that would prevent the PrometheusRule from rendering in some very unique monitoring scenarios.
I have checked this with as many permutations that I can manually.
If a user wants to render the PrometheusRule, they should be able to so long as they have enabled ServiceMonitor or PodMonitor, and in the case of the ServiceMonitor they have set
Values.service.enabled
==true
.closes #1379