Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(collector): actually enable the deployment of PrometheusRule when using PodMonitor #1416

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charts/opentelemetry-collector/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
name: opentelemetry-collector
version: 0.109.0
version: 0.109.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run make generate-examples CHARTS=opentelemetry-collector to generate the examples.

description: OpenTelemetry Collector Helm chart for Kubernetes
type: application
home: https://opentelemetry.io/
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if and .Values.prometheusRule.enabled .Values.serviceMonitor.enabled }}
{{- if and .Values.prometheusRule.enabled (or (eq (include "opentelemetry-collector.serviceEnabled" .) "true") (or .Values.serviceMonitor.enabled .Values.podMonitor.enabled)) }}
apiVersion: monitoring.coreos.com/v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user wants to render the PrometheusRule, they should be able to so long as they have enabled ServiceMonitor or PodMonitor, and in the case of the ServiceMonitor they have set Values.service.enabled == true

@crutonjohn this sentence reads as if the condition should be

if (or .Values.podMonitor.enabled (and .Values.serviceMonitor.enabled (eq (include "opentelemetry-collector.serviceEnabled" .) "true")))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in hindsight i probably shouldn't have tried to gotmpl that late in the day 😅

i think i mostly have it figured out:

Suggested change
{{- if and .Values.prometheusRule.enabled (or (eq (include "opentelemetry-collector.serviceEnabled" .) "true") (or .Values.serviceMonitor.enabled .Values.podMonitor.enabled)) }}
{{- if and .Values.prometheusRule.enabled (or .Values.podMonitor.enabled (and (eq (include "opentelemetry-collector.serviceEnabled" .) "true") .Values.serviceMonitor.enabled)) }}

however, with this logic a user could use these values:

mode: "deployment"
image:
  repository: "otel/opentelemetry-collector-k8s"
prometheusRule:
  enabled: true
ports:
  metrics:
    enabled: true
podMonitor:
  enabled: false
serviceMonitor:
  enabled: false
service:
  enabled: false

and believe they are going to get a PrometheusRule, but instead get no PrometheusRule/ServiceMonitor/PodMonitor because neither service nor pod monitors are enabled.

would it be considered acceptable UX to prevent the user from getting a PrometheusRule CR when they explicitly say they want one? or should we remain somewhat opinionated in that you should only be deploying a PrometheusRule CR when you have a Pod/ServiceMonitor getting deployed in tandem with said PrometheusRule?

happy to accommodate either way, i just don't know what's best for the project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm thats a good point.

Is there any harm in creating a prometheusRule even if pod/service monitor don't exist? What will happen if something tries to use the Prometheus rule but the monitors dont exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd have to do some verification, but iirc a PrometheusRule is just what it sounds like. it would effectively create the rule(s) contained within the file. there is some nuance with how prometheus operator is deployed that can affect how the PrometheusRule can "land" on the cluster...for example if you are running multiple prometheus instances in the same cluster scraping different targets. the principle is still the same.

the rules in the collector chart are generic enough (read: have low granularity due to promql label selector omission) that there could be a scenario wherein a metric like otelcol_exporter_send_failed_log_records exists in the TSDB from a source that isn't our deployed collector (or the user has this chart deployed multiple times for different situations), users could hypothetically be getting alerts for something unrelated to our collector deployment.

this would be kind of a "perfect storm" type scenario to be fair, but definitely possible. i've seen some pretty funky prometheus operator setups before. i think the crux of the issue is that there are tons of potential deployment scenarios that we couldn't possibly account for every one with the logic within the chart. part of the responsibility is on the end user being familiar with what the chart's behavior enough to know "hey i don't need to enable the PrometheusRule if i'm not going to be using the Service/PodMonitor". could add some docs somewhere, idk where the best place would be tho.

sorry, not trying to be a headache.

kind: PrometheusRule
metadata:
Expand Down
Loading