Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow parallel testing, take #2 #181

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Babar
Copy link
Contributor

@Babar Babar commented Dec 9, 2021

Basically a re-do of #158.
This got rolled back because @Babar wrote 3 different PRs and one of them was bad.
This one wasn't. Technically it wasn't good either, as it only did test, not really untest, run or keeptesting.

This re-do fixes this.

Still the same features:

  • Can parametrize the number of threads in the config, or on the command line, default to the number of cores
  • Reports hosts depending whether they failed ssh or something else
  • Doesn't have any extra dependency, as the thread pool is done in a very simple way

Test Plan:
Ran it on my machine. Tried test and untest, both worked.

Signed-off-by: Olivier Raginel [email protected]

@jaymzh
Copy link
Collaborator

jaymzh commented Dec 9, 2021

IIRC, this broke internal wrappers... so CC @NaomiReeves on that.

But you definitely should make sure you've tested this with internal wrappers.

@Babar
Copy link
Contributor Author

Babar commented Dec 9, 2021

The other PR broke the wrappers, from what I remember. The part where I moved the ARGS to be hostnames, 'cause some parts of our wrappers rely on TT just dropping unknown parameters.
I've fixed the wrapper too, but this is mostly to get speed improvements when testing a bunch of hosts.
Also, I've fixed the lint, so updating the PR shortly.

@Babar
Copy link
Contributor Author

Babar commented Dec 9, 2021

Oh one thing I'd love feedback on:

  • How do people think we should deal with failures?
  • How do we deal with logs?
    In test, we report the failures and exit with:
  • 0 if no failure
  • 1 if all hosts failed and at least one failure was due to ssh connection issue
  • 2 otherwise (meaning at least one host failed for whatever reason and at least one succeeded)
  • 3 if all hosts failed because they were already tested by another user
    I kept this logic for the other modes (so untest, run and keeptesting), even though 3 cannot happen.

As for logging, run can get very noisy with all the threads writing at the same time, and not knowing what's what. So I started writing something to prefix the logs with the hostname. But then I went down the rabbit hole, so I'll try to finish this some other night.

# Poor man thread pool manager: keeping it simple
nb_threads_over_max = host_threads.length - TasteTester::Config.parallel_hosts
if nb_threads_over_max >= 0
host_threads[nb_threads_over_max].join

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this join be rescued as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess this never happened in my testing 'cause I have a gazillion of cores :)

@jaymzh
Copy link
Collaborator

jaymzh commented Dec 9, 2021

Oh one thing I'd love feedback on:

  • How do people think we should deal with failures?
  • How do we deal with logs?
    In test, we report the failures and exit with:
  • 0 if no failure
  • 1 if all hosts failed and at least one failure was due to ssh connection issue
  • 2 otherwise (meaning at least one host failed for whatever reason and at least one succeeded)
  • 3 if all hosts failed because they were already tested by another user
    I kept this logic for the other modes (so untest, run and keeptesting), even though 3 cannot happen.

As for logging, run can get very noisy with all the threads writing at the same time, and not knowing what's what. So I started writing something to prefix the logs with the hostname. But then I went down the rabbit hole, so I'll try to finish this some other night.

I'm glad you asked!

A lot of this is actually the kind of stuff that we (we = TM folks, not FB folks, so past-past-past-life) solved over in https://github.com/txcketmaster-xx/onall - it forks a bunch of ssh's, joins 'em all back, and handles output nicely. It has a few options - live, buffered, prefixed-by-hostname, in-a-directory one-file-per-host, etc. and I think it's a great model for how to manage output and make it usable for folks.

As for return values: 0 - no errors, we can all agree on that. But maybe some more nuance for errors:

  • All failed, all SSH failures (no test-setup/run failures)
  • All failed, all because in-testing
  • All failed, other test-setup/run failures
  • All failed, mix of issues
  • Some failed, all SSH related
  • Some failed, all because in-testing
  • Some failed, other test-setup/run failures
  • Some failed, mix of issues

Further, I might recommend outputing a JSON structure of host=>failure, perhaps on stderr, or perhaps to a configurable file.

@Babar
Copy link
Contributor Author

Babar commented Dec 9, 2021

Stupid question, but what's the difference between onall and parallel? They're bot written in perl, and seem to be doing pretty much the same thing. I mean, parallel --tag ssh {} .... Haven't read the entire code yet, but I know parallel's code fairly well, which is why I'm asking.

Summary:
Basically a re-do of facebook#158.
This got rolled back because @Babar wrote 3 different PRs and one of them was bad.
This one wasn't. Technically it wasn't good either, as it only did `test`, not really `untest`, `run` or `keeptesting`.

This re-do fixes this.

Still the same features:
* Can parametrize the number of threads in the config, or on the command line, default to the number of cores
* Reports hosts depending whether they failed ssh or something else
* Doesn't have any extra dependency, as the thread pool is done in a very simple way

Test Plan:
Ran it on my machine. Tried test and untest, both worked.

Reviewers: nreeves, dcavalca, jaymzh

Subscribers:

Tasks:

Tags:
Signed-off-by: Olivier Raginel <[email protected]>
Summary:
As upload can take a very long time, we can just have a dedicated thread for it
@Babar Babar requested a review from gogsbread May 24, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants