Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure HealthCheck with podman update #24442

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Honny1
Copy link
Member

@Honny1 Honny1 commented Nov 1, 2024

New flags in a podman update can change the configuration of HealthCheck when the container is started, without having to restart or recreate the container.

This can help determine why a given container suddenly started failing HealthCheck without interfering with the services it provides. For example, reconfigure HealthCheck to keep logs longer than the usual last X results, store logs to other destinations, etc.

These flags are added to the podman update command:

  • --health-cmd string: set a healthcheck command for the container ('none' disables the existing healthcheck)
  • --health-interval string: set an interval for the healthcheck (a value of disable results in no automatic timer setup)(Changing this setting resets timer.) (default "30s")
  • --health-log-destination string:  set the destination of the HealthCheck log. Directory path, local or events_logger (local use container state file)(Warning: Changing this setting may cause the loss of previous logs.) (default "local")
  • --health-max-log-count uint: set maximum number of attempts in the HealthCheck log file. ('0' value means an infinite number of attempts in the log file) (default 5)
  • --health-max-log-size uint: set maximum length in characters of stored HealthCheck log. ('0' value means an infinite log length) (default 500)
  • --health-on-failure string: action to take once the container turns unhealthy (default "none")
  • --health-retries uint: the number of retries allowed before a healthcheck is considered to be unhealthy (default 3)
  • --health-start-period string: the initialization time needed for a container to bootstrap (default "0s")
  • --health-startup-cmd string: Set a startup healthcheck command for the container
  • --health-startup-interval string: Set an interval for the startup healthcheck. Changing this setting resets the timer, depending on the state of the container. (default "30s")
  • --health-startup-retries uint: Set the maximum number of retries before the startup healthcheck will restart the container
  • --health-startup-success uint: Set the number of consecutive successes before the startup healthcheck is marked as successful and the normal healthcheck begins (0 indicates any success will start the regular healthcheck)
  • --health-startup-timeout string: Set the maximum amount of time that the startup healthcheck may take before it is considered failed (default "30s")
  • --health-timeout string:  the maximum time allowed to complete the healthcheck before an interval is considered failed (default "30s")
  • --no-healthcheck: Disable healthchecks on container

Fixes: https://issues.redhat.com/browse/RHEL-60561

Does this PR introduce a user-facing change?

Configure HealthCheck with podman update

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Nov 1, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 1, 2024
@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Nov 1, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@Honny1 Honny1 force-pushed the change-healthcheck-config-via-podman-update branch from 2f18a35 to 475e16f Compare November 1, 2024 07:35
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 1, 2024
@Honny1 Honny1 force-pushed the change-healthcheck-config-via-podman-update branch from 475e16f to e052162 Compare November 1, 2024 15:20
@openshift-ci openshift-ci bot added release-note and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Nov 1, 2024
@Honny1 Honny1 force-pushed the change-healthcheck-config-via-podman-update branch from e052162 to 432b6d3 Compare November 1, 2024 15:52
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really a full review just a quick look:

Can you remove all the flag description from the commit? I do not see how they add any value there. If I want to know what a flag does one should look at the docs or I can see that already in the diff here anyway.

noHealthCheck := false
for k, v := range *changedHealthCheckConfig {
switch k {
case "NoHealthCheck":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these strings seem to be used in many places so can you define them as constants somewhere, ie. libpod/define and then use the constants over the string duplication

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have an idea of how to simplify it overall.

@Honny1 Honny1 force-pushed the change-healthcheck-config-via-podman-update branch 2 times, most recently from 621c46c to 96def55 Compare November 1, 2024 19:25
@@ -665,6 +548,141 @@ func DefineCreateFlags(cmd *cobra.Command, cf *entities.ContainerCreateOptions,
`If a container with the same name exists, replace it`,
)
}
if mode == entities.CreateMode || mode == entities.UpdateMode {
// TODO: Focus on disable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My note, I'll delete it.

@@ -17,7 +17,7 @@ import (
)

var (
updateDescription = `Updates the cgroup configuration of a given container`
updateDescription = `Updates the configuration of an already existing container, allowing different resource limits to be set, and HealthCheck configuration. The currently supported options are a subset of the podman create/run.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Updates the configuration of an existing container, allowing changes to resource limits and healthchecks"

@@ -2738,3 +2741,327 @@ func (c *Container) update(resources *spec.LinuxResources, restartPolicy *string

return nil
}

func (c *Container) getCopyOfHealthCheckAndStartupHelathCheck() (*manifest.Schema2HealthConfig, *define.StartupHealthCheck) {
var healthCheck manifest.Schema2HealthConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and startupHealthCheck should be pointers - will make returning nil a lot easier

func (c *Container) getCopyOfHealthCheckAndStartupHelathCheck() (*manifest.Schema2HealthConfig, *define.StartupHealthCheck) {
var healthCheck manifest.Schema2HealthConfig
if c.config.HealthCheckConfig != nil {
healthCheck = manifest.Schema2HealthConfig{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can JSONDeepCopy this and save a lot of code


var startupHealthCheck define.StartupHealthCheck
if c.config.StartupHealthCheckConfig != nil {
startupHealthCheck = define.StartupHealthCheck{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, investigate JSONDeepCopy, I don't think this is performance critical at all.

return &healthCheck, &startupHealthCheck
}

func (c *Container) changeHealthCheckConfiguration(changedHealthCheckConfig *entities.UpdateHealthCheckConfig) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like leaking entities into Libpod. Maybe move UpdateHealthCheckConfig into libpod and make the entities version just contain it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no. We should probably move this parsing code out of Libpod. Instead we should accept a manifest.Schema2HealthConfig and define.StartupHealthCheck in Update in libpod, and do all the validation in the frontend.

}

logrus.Debugf("HealthCheck updated for container %s", c.ID())
c.newContainerEvent(events.Update)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're calling this from Update() this is unnecessary, we only want one and it should be in Update itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment here still seems to be open, we should not be creating two events

@Honny1 Honny1 force-pushed the change-healthcheck-config-via-podman-update branch 3 times, most recently from 74eae00 to 4471330 Compare November 6, 2024 16:46
@Honny1 Honny1 marked this pull request as ready for review November 6, 2024 18:39
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 6, 2024
@Honny1
Copy link
Member Author

Honny1 commented Nov 6, 2024

@mheon and @Luap99 I have edited the PR according to your comments.

@@ -136,6 +137,61 @@ func (c *Container) Update(resources *spec.LinuxResources, restartPolicy *string
return c.update(resources, restartPolicy, restartRetries)
}

// UpdateHealthCheckConfig updates HealthCheck configuration the given container.
func (c *Container) UpdateHealthCheckConfig(healthCheckConfig *manifest.Schema2HealthConfig, changedTimer bool, noHealthCheck bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noHealthCheck can probably be healthCheckConfig == nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think we might want to compute changedTimer in here, instead of requiring the user pass it in - moving the rest of the parsing out of libpod was good but this was maybe a step too far.


// UpdateGlobalHealthCheckConfig updates global HealthCheck configuration the given container.
// If value is nil then value will be not changed.
func (c *Container) UpdateGlobalHealthCheckConfig(healthLogDestination *string, healthMaxLogCount *uint, healthMaxLogSize *uint, healthCheckOnFailureAction *define.HealthCheckOnFailureAction) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make a GlobalHealthCheckOptions struct that contains all of these? Arguments list is a little long

@@ -2738,3 +2739,122 @@ func (c *Container) update(resources *spec.LinuxResources, restartPolicy *string

return nil
}

func (c *Container) resetHealthCheckTimers(noHealthCheck bool, changedTimer bool, isStartup bool) error {
if !c.ensureState(define.ContainerStateCreated, define.ContainerStateRunning) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Paused as well? Do we stop the timers when we pause a container?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found any code that can stop the timer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. We should talk about this at standup on Monday. Not something we should solve in this PR but it does sound like a bug.

Copy link
Member Author

@Honny1 Honny1 Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer runs even after the container is paused. Systemctl displays an error for a few seconds:

Nov 15 18:08:33 fedora podman[42]: error: container <id> is not running

We can discuss it on standup, but on Monday, I am going to midterm. I can create an Issue so we don't forget about it and then we can discuss it.

Copy link
Member Author

@Honny1 Honny1 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.state.HCUnitName); err != nil {
return err
}
case isStartup && changedTimer && c.config.StartupHealthCheckConfig != nil && !c.state.StartupHCPassed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I think we hit this if we add a startup healthcheck to a running container that did not previously have a startup healthcheck. We probably don't want to do that, so we should set StartupHCPassed to true when adding a startup healthcheck to a running container that doesn't have one already

return err
}

err := c.resetHealthCheckTimers(noHealthCheck, changedTimer, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be combined with the following line

return nil
}

func (c *Container) updateStartupHealthCheckConfiguration(startupHealthCheckConfig *define.StartupHealthCheck, changedStartupTimer bool, noHealthCheck bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically identical to updateHealthCheckConfiguration minus a few variables; can the two be refactored to share code? It seems like everything from line 2805 on is basically identical.

@@ -217,6 +217,8 @@ const (
Untag Status = "untag"
// Update indicates that a container's configuration has been modified.
Update Status = "update"
// Update indicates that a container's HealthCheck configuration has been modified.
UpdateHealthCheckConfig Status = "update-HealthCheck-config"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably unnecessary; we can just use Update


@@option health-interval

Changing this setting resets timer.
Copy link
Member

@TomSweeneyRedHat TomSweeneyRedHat Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Changing this setting resets timer.
Changing this setting resets the timer.

// HealthMaxLogCount set maximum number of attempts in the HealthCheck log file.
// ('0' value means an infinite number of attempts in the log file)
HealthMaxLogCount *uint `json:"health_max_log_count,omitempty"`
// HealthOnFailure set action to take once the container turns unhealthy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// HealthOnFailure set action to take once the container turns unhealthy.
// HealthOnFailure set the action to take once the container turns unhealthy.

@TomSweeneyRedHat
Copy link
Member

A couple of small nits. Once @mheon 's concerns are addressed, LGTM

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edsantiago Mind looking at the sys tests?

Comment on lines 176 to 193
if [[ $is_startup = "yes" ]]; then
run_podman run -d --name $ctrname \
--health-cmd "echo $msg" \
--health-startup-cmd "echo $msg" \
$IMAGE /home/podman/pause
cid="$output"
else
if [[ $no_hc = "yes" ]]; then
run_podman run -d --name $ctrname \
$IMAGE /home/podman/pause
cid="$output"
else
run_podman run -d --name $ctrname \
--health-cmd "echo $msg" \
$IMAGE /home/podman/pause
cid="$output"
fi
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a lot of duplication here, first you can use elif to avoid one layer of nesting

In general it is hard for the reader to figure out the differences, you should split out the run_podman call and only do the stuff that is different in there, i.e. define a hc_flags arrray where you add the flags too and then to the run_podman call below which would make this a lot simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, what Paul said

Comment on lines 625 to 629
sleep 2s

run_podman update $ctrname --health-interval 1s

sleep 5s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding random sleeps is very bad you tests are adding over 20s of CI time doing nothing but sleeping, this is very wasteful.

Is there a way we can combine the test cases to avoid adding so much sleeps. And if we know one things sleeps means flakes so this is not very good. If there is no way to avoid it then at the very least all these testsshoul dbe run in parallel to not waste so much time

Specgen *specgen.SpecGenerator
NameOrID string
Specgen *specgen.SpecGenerator
ChangedHealthCheckConfiguration *define.UpdateHealthCheckConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why we decided to send a full specgen when the server does not understand most of it but doesn't the specgen already contain all the healtchcheck settings? So it seems wasteful and confusing to send yet another type? Am I missing something?

Copy link
Member Author

@Honny1 Honny1 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created another type where I can better distinguish what value has been set as new by the user using pointers. The entire LinuxResources type uses pointers or lists, but the HealthCheck type does not contain pointers, so I can't distinguish whether the user set the default values of the types or not.

The implementation of the podman update replaces LinuxResources from the specgen without caring about the previous configuration or whether the user set the value. This will cause the resource configuration to be overwritten by bad values.

Here is an example:

$ ./bin/podman run -dt --replace --name hc1 quay.io/libpod/alpine:latest top
$ ./bin/podman inspect hc1 --format {{.HostConfig.PidsLimit}}
2048
$ ./bin/podman update hc1 --pids-limit 1024
$ ./bin/podman inspect hc1 --format {{.HostConfig.PidsLimit}}
1024
$ ./bin/podman update hc1 --cpus 2 # Change any other resource.
$ ./bin/podman inspect hc1 --format {{.HostConfig.PidsLimit}}
0  <— this value should be 1024

I think this is a bug. I will create an issue. And I'll fix it in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree that is a bug, I guess using specgen there was a mistake then. The issue is we cannot just break this as they are exposed via API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Podman REST API has never used spacegen for updates.

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DIE and DRY are critical concepts in software development. I encourage you to develop a mindset where you're looking for duplication and, each time you see it, or each time you find yourself copypasting, taking a step back to ask yourself how you can eliminate it.

Once you clean up all my suggestions below, I bet you might even find more, and I bet you can then find a way to make the test table-driven which is even better.

And then, of course, please add # bats test_tags=ci:parallel

test/system/280-update.bats Outdated Show resolved Hide resolved
Comment on lines 176 to 193
if [[ $is_startup = "yes" ]]; then
run_podman run -d --name $ctrname \
--health-cmd "echo $msg" \
--health-startup-cmd "echo $msg" \
$IMAGE /home/podman/pause
cid="$output"
else
if [[ $no_hc = "yes" ]]; then
run_podman run -d --name $ctrname \
$IMAGE /home/podman/pause
cid="$output"
else
run_podman run -d --name $ctrname \
--health-cmd "echo $msg" \
$IMAGE /home/podman/pause
cid="$output"
fi
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, what Paul said

test/system/280-update.bats Outdated Show resolved Hide resolved
test/system/280-update.bats Outdated Show resolved Hide resolved
test/system/280-update.bats Outdated Show resolved Hide resolved
@Honny1 Honny1 force-pushed the change-healthcheck-config-via-podman-update branch from 4471330 to dfb98ce Compare November 12, 2024 13:16
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much better! Thank you! You are developing good practices. Still needs fixing though. A few recommendations inline, and one suggestion that I didn't take the time to implement.

Thanks again.

# HealthCheck configuration

# bats test_tags=ci:parallel
@test "podman update - test all HealtCheck flags" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, HealthCheck

test/system/280-update.bats Show resolved Hide resolved
test/system/280-update.bats Outdated Show resolved Hide resolved
test/system/280-update.bats Outdated Show resolved Hide resolved
@Honny1 Honny1 force-pushed the change-healthcheck-config-via-podman-update branch 2 times, most recently from a762e8f to 6019246 Compare November 13, 2024 14:21
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating and rebasing and for your initiative in cleaning up more than I asked for.

Tests LGTM. One comment inline, not a blocker.

run_podman healthcheck run $ctrname
is "$output" "" "output from 'podman healthcheck run'"

# Run podman update in two separate runs to make sure HealthCheck is overwritten correctly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this new action (splitting into two commands) nor its comment. I will assume it makes sense to everyone else, and won't block on it but will point it out in case others are puzzled by it too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I found a bug. When you try to change any resource using podman update, it overwrites the current resource configuration. Here is an example:

$ ./bin/podman run -dt --replace --name hc1 quay.io/libpod/alpine:latest top
$ ./bin/podman inspect hc1 --format {{.HostConfig.PidsLimit}}
2048
$ ./bin/podman update hc1 --pids-limit 1024
$ ./bin/podman inspect hc1 --format {{.HostConfig.PidsLimit}}
1024
$ ./bin/podman update hc1 --cpus 2 # Change any other resource.
$ ./bin/podman inspect hc1 --format {{.HostConfig.PidsLimit}}
0  <— this value should be 1024

I want to make sure this is not happening for Healtcheck as well.

Copy link
Contributor

openshift-ci bot commented Nov 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Honny1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2024
@Honny1
Copy link
Member Author

Honny1 commented Nov 13, 2024

@mheon @TomSweeneyRedHat @Luap99 I have updated the PR according to your comments.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some more time to undersatnd this.

}

logrus.Debugf("HealthCheck updated for container %s", c.ID())
c.newContainerEvent(events.Update)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment here still seems to be open, we should not be creating two events

return nil
}

func (c *Container) updateHealthCheck(newHealthCheckConfig interface{}, currentHealthCheckConfig interface{}, setConfigFunc func(config interface{})) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this accepting interfaces, this makes API design super fragile and error prone. We should not be mixing so many types. reflection is difficult to understand and use correctly.

// NOTE: This is only safe while being called from "podman healthcheck run" which we know
// is the case here as we should not alter the exit code of another process that just
// happened to call this.
shutdown.SetExitCode(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait this is no longer true now, if you call this from outside podman healtchcheck run we have big problems espically for the system service as this overwrites the global exit code on signals.

Comment on lines 2849 to 2854
c.valid = false
err := WithHealthCheckLogDestination(*globalOptions.HealthLogDestination)(c)
if err != nil {
return err
}
c.valid = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is extremely ugly, setting a container to not valid like this is not sane if the pointer is shared acorrs several threads another container may see a invalid container and bail out. I don't think we that is a huge risk as we not really share these structs at the same time but still.

And calling CtrCreateOption function somehwere else is just not nice to read. Please just split out the validation bits into another function and then call it from here and WithHealthCheckLogDestination()

utils.Error(w, http.StatusInternalServerError, fmt.Errorf("decode(): %w", err))
return
}
err = ctr.Update(options.Resources, restartPolicy, restartRetries)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just call (ic *ContainerEngine) ContainerUpdate() from like many other endpoints do to avoid duplication

Comment on lines 1816 to 1844
healthCheckConfig, changedHealthCheck, err := specgenutil.GetNewHealthCheckConfig(container.HealthCheckConfig(), *updateOptions.ChangedHealthCheckConfiguration)
if err != nil {
return "", err
}
if changedHealthCheck {
if err := container.UpdateHealthCheckConfig(healthCheckConfig); err != nil {
return "", err
}
}

startupHealthCheckConfig, changedStartupHealthCheck, err := specgenutil.GetNewStartupHealthCheckConfig(container.Config().StartupHealthCheckConfig, *updateOptions.ChangedHealthCheckConfiguration)
if err != nil {
return "", err
}
if changedStartupHealthCheck {
if err := container.UpdateStartupHealthCheckConfig(startupHealthCheckConfig); err != nil {
return "", err
}
}

globalHealthCheckOptions, err := specgenutil.GetNewGlobalHealthCheck(*updateOptions.ChangedHealthCheckConfiguration)
if err != nil {
return "", err
}
if err := container.UpdateGlobalHealthCheckConfig(globalHealthCheckOptions); err != nil {
return "", err
}

if err = container.Update(updateOptions.Specgen.ResourceLimits, restartPolicy, updateOptions.Specgen.RestartRetries); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this entire flow makes updates not atomic which seems bad. Every time you call into libpod we have to lock again. sync the state again and then do whatever we do save the db config and state again. Doing this multiple times is slow and not atomic which means ugly race conditions most likely.

I rather have this moved into one libpod function that acts under one lock.

return uint(valUint), nil
}

func GetChangedHealthCheckConfiguration(cmd *cobra.Command) define.UpdateHealthCheckConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cobra imports should not be in specgen, cli parts should all stay under cmd/podman.
I See noe reason why the function is here.

case "health-max-log-size":
val, err := parseUint(valStr)
if err != nil {
logrus.Error(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these errors must be handled and returned.


func GetChangedHealthCheckConfiguration(cmd *cobra.Command) define.UpdateHealthCheckConfig {
updateHealthCheckConfig := define.UpdateHealthCheckConfig{}
cmd.Flags().Visit(func(f *pflag.Flag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we loop here at all over all flags? Can't we just access all the flags directly if we have to list out all name anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visit visits only those flags that have been set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use .Changed() on a flag to know if was set or not

@Honny1 Honny1 force-pushed the change-healthcheck-config-via-podman-update branch 4 times, most recently from 772a856 to 9145618 Compare November 15, 2024 16:55
New flags in a `podman update` can change the configuration of HealthCheck when the container is started, without having to restart or recreate the container.

This can help determine why a given container suddenly started failing HealthCheck without interfering with the services it provides. For example, reconfigure HealthCheck to keep logs longer than the usual last X results, store logs to other destinations, etc.

Fixes: https://issues.redhat.com/browse/RHEL-60561

Signed-off-by: Jan Rodák <[email protected]>
@Honny1
Copy link
Member Author

Honny1 commented Nov 18, 2024

/packit retest-failed

@TomSweeneyRedHat
Copy link
Member

If @Luap99 is happy with the answers to your conversations, then LGTM
Happy Green Test Buttons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants