Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failfast mode #2308

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add failfast mode #2308

wants to merge 3 commits into from

Conversation

kuchara
Copy link

@kuchara kuchara commented Jul 7, 2023

Commit adds new command line option, -K, that aborts all running jobs after first failed job.

This implements #2083.

Bartlomiej Kucharczyk added 2 commits July 7, 2023 18:10
Commit adds new command line option, -K, that aborts all running
jobs after first failed job.
Fixing missing parentheses.
@emmenlau
Copy link

Reporting on the current status of this PR:

On Linux, it works flawless! The build stops on first error, which is great! Very nice work!

On Windows, it stops on first error, but I do get a terminal prompt:

6 errors generated.
ninja: build stopped: interrupted by user.
Terminate batch job (Y/N)? 

I am running ninja inside CMD, which in turn has been started from Cygwin bash. I am not certain whether CMD, Cygwin or bash have an effect on this. The prompt seems to originate from CMD, so maybe the kill signal should not be passed to CMD?

@kuchara
Copy link
Author

kuchara commented Jul 17, 2023

Thanks for the checks, @emmenlau!

As I read some userexchange discussions, this message on Windows CMD is "by design". :-( https://superuser.com/a/35701/982937

Do you have any suggestions how to continue here?

@emmenlau
Copy link

emmenlau commented Jul 17, 2023

As I read some userexchange discussions, this message on Windows CMD is "by design". :-( https://superuser.com/a/35701/982937

Yep, after some googling I was also under the impression that CTRL-C is not really awesome for these kind of things.

Do you have any suggestions how to continue here?

Well, I'm really no expert, but sending CTRL-C to oneself seems anyways a "hard" way to end the whole process tree. Are these even separate processes or "just" threads, or even something else? What other ways does Win32 provide to kill them? Do we even need to kill, or can we just send an "interrupt" signal and catch that on the receiving side to exit the processes?

For example https://stackoverflow.com/a/1173396/7200132 has some suggestions. But I did not look into ninja's code, I'm not even sure what kind of processes we are talking about, and how to cascade the kill to sub-processes that ninja started...

@emmenlau
Copy link

Dear @kuchara , this would be a nice feature for us, but I'm not knowledgeable on how to proceed with this feature on Windows. Are you planning to look into this? Thanks for the nice work!

Previous attempt has problems on passing signals on Windows.

This time, reusing similar functions as used when closing (Clear(),
which is expected to interrupt its running children).

For posix, when there is no external signal, 0 is passed to kill()
syscall (thus doing nothing). To mitigate this, setting SIGINT signal
just before calling Clear(), which is essentially simulating SIGINT
handling.
@kuchara
Copy link
Author

kuchara commented Nov 6, 2023

Hello @emmenlau !

I'd love to finish this.

Finally was able to spent some more time to inspect ninja source code, and I've noticed that regular Clear() function done in build.cc is actually sending signals to its children, so decided to give it a try.

Could you try the newest change I've just uploaded?

If that does not work, my only suggestion would be to enable this feature only for posix systems, until there will be someone who could help with Windows case.
In my company we already use patched ninja for some time with very good results (cha-ching!).

@emmenlau
Copy link

Awesome @kuchara , this works very nicely on Windows! I have not tested Linux and MacOSX yet, but the change looks very promising! Great work!


void SubprocessSet::Abort() {
SetInterruptedFlag(SIGINT);
Clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patch. Can you ensure that SIGINT is also sent to the console subprocess, if there is one?
Otherwise if the failure comes from a non-console command, the console subprocess will continue to run and block Ninja.

Let me clarify:

Ninja launches subprocesses in their own process group, with the exception of commands in the "console" pool. These are launched in the same process group as Ninja itself. This is done to ensure that when Ctrl-C is pressed, both Ninja and the console subprocess receive a SIGINT (I suspect this is required for proper terminal handling too, but I don't remember the details).

By default, the console subprocess receiving the SIGINT would exit, while Ninja would catch the SIGINT, report it when exiting DoWork() to the Builder(), which ends up invoking Clear(). The latter relays the SIGINT to all non-console subprocesses to stop them as well (hence the use_console check in that function).

With your code, if the failure comes from a non-console subprocess, calling SetInterruptedFlag(SIGINT) will tell the next DoWork() call to report an interrupt, but will not send SIGINT to the console process which will continue running. And because it also shared the stdout / stderr descriptors, it will continue printing to the terminal. And Ninja will likely be stuck waiting for it in the Subprocess destructor (which ends up calling waitpid()).

It looks like something similar is true for Win32.

Also, because this is very subtle, I strongly recommend adding a robust unit test for this behavior to your PR.

For example one with a long console subprocess that sleeps for 10 seconds, and another non-console one that fails immediately. The test should verify that Ninja exits immediately with -K. You should also test the case where the console process fails first of course.

Hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants