Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool_pid argument to parent_setup_fn #990

Closed
wants to merge 4 commits into from

Conversation

charmoniumQ
Copy link
Contributor

@charmoniumQ charmoniumQ commented Feb 12, 2024

Addresses #989

The container implementation is easy. Just pass the pid we already got to parent_setup_fn. They already wait for the parent to signal readiness.

However, the non-conainer implementation does not already have this wait, so I had to add it in. Like the container case, the child inherits the read-end of a pipe through which the parent sends a byte when they are ready.

There are two ways I thought of to implement the wait:

  1. Wait in preexec_fn. The naive approach creates a deadlock while executing the subprocess.Popen(...). That is not obvious to me, since the preexec_fn gets executed after the fork. It may have something to do with the GIL?
  2. Run a "wrapper" program that waits and then exec's the tool. Replace subprocess.run(cmd) with `subprocess.run(["python", "-c", "do_wait; os.execv(cmd)"]):

I used the second strategy because the first one was deadlocked. However, this creates a few test failures, both relate to the non-container case, where we replace a direct

  1. Something is wrong with test_input_is_redirected_from_stdin probably because the semantics of how file-descriptors are inherited is changed slightly..
  2. It doesn't detect when the command doesn't exist in quite the same way. subprocess.run(...) generates with an exception in the parent process. subprocess.run(["python", "-c", "os.execv(...)"]) generates an exception, but not in the parent process.

I think I am going try debuggin the deadlock in preexec_fn instead.

@PhilippWendler
Copy link
Member

Using a subprocess here is not possible. It adds complexity and a lot of additional possible failure modes and things that can go wrong. And the overhead of this subprocess would be included in the measurement, but BenchExec takes great care to ensure that only a minimal amount of setup and teardown code is included in the measurement.

Doing anything sophisticated inside preexec_fn is also difficult. The documentation already says that one should not use it if the application is multi-threaded, although we are currently having no problem with that. But if you start putting more code into it, especially code that is related to threading, such as waiting, I would not be surprised if it breaks.

So our goal is to avoid additional risks of failure and complexity in this area of BenchExec as far as possible. Right now this PR seems to add something that would be unused inside BenchExec itself and only be of use for potential external extensions of BenchExec, is this correct? In this case we would need a very strong argument for accepting the risk and maintenance effort related to this.

@charmoniumQ
Copy link
Contributor Author

What about making the feature "PID gets passed parent_setup_fn" only available in container-execution mode, where the implementation is much simpler (the main branch already knows the PID of the grandchild at that point)?

The argument for implementation is that eBPF code can likely not be invoked inside the container, but if we have the PID, we can easily invoke outside of the container.

@PhilippWendler
Copy link
Member

If we can add this with negligible risk and maintenance effort, which seems to be the case according to your description, then yes, I would be ok with that.

Medium to long term a lot of this might change due to #875 and because I want to switch from PID tracking to PID FDs as soon as we can ignore kernels that are too old for PID FDs. So I won't promise to keep this forever, but certainly as long as it doesn't hinder us.

@charmoniumQ charmoniumQ force-pushed the patch-parent-setup-fn branch 3 times, most recently from b27ee26 to b8af129 Compare February 13, 2024 20:43
@charmoniumQ charmoniumQ marked this pull request as ready for review February 13, 2024 20:54
@charmoniumQ
Copy link
Contributor Author

The new commit only implements passing PID in ContainerExecutor._start_execution_in_container which is minimally invasive. I update the docstrings to refer to the new feature.

@charmoniumQ charmoniumQ marked this pull request as draft February 13, 2024 21:25
benchexec/containerexecutor.py Show resolved Hide resolved
benchexec/containerexecutor.py Outdated Show resolved Hide resolved
@param parent_setup_fn a function without parameters that is called in the parent process
immediately before the tool is started
@param parent_setup_fn a function that is called in the parent process
immediately before the tool is started. The keyword-arguments passed may differ in subclasses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think beyond that we need documentation what parameters may actually be passed, either here or in the respective classes.

benchexec/baseexecutor.py Outdated Show resolved Hide resolved
@charmoniumQ
Copy link
Contributor Author

It's actually not too hard to have the child let the parent know its pid in their setup function. The hard part (which I am not going to do) is having the child wait for the parent's setup function to be done before continuing.

Having the parent get the child's PID makes the API consistent. Clients won't have to change their parent_setup_fn when they switch on/off use_namespaces.

Comment on lines +752 to +757
self.parent_setup_fn = (
util.dummy_fn if parent_setup_fn is None else parent_setup_fn
)
self.parent_cleanup_fn = (
util.dummy_fn if parent_cleanup_fn is None else parent_cleanup_fn
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why store this in attributes? AFAIS the only use is in _execute, which is directly called from this method.

Comment on lines +980 to +981
child_pid=child_pid,
grandchild_pid=grandchild_pid,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both? child_pid and the fact that there are two processes are really implementation details, and we shouldn't expose them in a public API. Would it be enough if we simply pass the PID of the benchmarked process?

@@ -869,6 +881,8 @@ def postParent(preParent_result, exit_code, base_path):
if exit_code.value not in [0, 1]:
_get_debug_output_after_crash(output_filename, base_path)

self.parent_cleanup_fn(parent_setup, exit_code, base_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not conform to the documented contract of parent_cleanup_fn. Also, passing exit_code seems fine, but I would hesitate to pass base_path.

Comment on lines +653 to +654
@param parent_setup_fn: A function that gets called in the parent process before the tool is allowed to run (default: noop fn). When used with use_namespaces=True, the function recieves child_pid and grandchild_pid.
@param parent_cleanup_fn: A function that gets called after the tool finishes running with the return value of parent_setup_fn.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about how to make it easy to ensure compatibility in the future here. For example, what can we do such that we can add more information in the future?

@@ -829,19 +840,20 @@ def _execute(
# Disable energy measurements because we use only parts of a CPU
packages = None

def preParent():
def preParent(**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use proper parameters here. This is called only internally, so we do not need to pass through an unknown number of parameters.

Comment on lines +103 to +104
os.write(to_parent, str(pid).encode())
os.close(to_parent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to keep this, it would need to move above the cgroups.add_task(pid) call, because that starts the measurement.

But considering the different semantics to the container case, and the fact that we do not have a use case for it, I would tend to omit it.

@PhilippWendler
Copy link
Member

@charmoniumQ Are you still interested in continuing with this?

@charmoniumQ
Copy link
Contributor Author

No, I don't have the time any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants