Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change compile_all to parallel #9

Open
felixschurk opened this issue Nov 26, 2022 · 5 comments
Open

change compile_all to parallel #9

felixschurk opened this issue Nov 26, 2022 · 5 comments

Comments

@felixschurk
Copy link

Hei,
this is more an enhancement than an issue.
When compiling all with ./calliope -p 2022 and there are quite some files in the directory it does take a long time.

My idea is to pass all the files into the GNU parallel command https://www.gnu.org/software/parallel/ which then would execute it on as many CPU's as the given machine has.

A downside is that I have currently not figured out how to stop if an file could not be compiled.
But this only gives one problem for the "bad" file, others will continue to be compiled.

I compared the timing with time for 59 tex files:
parallel :

real	6m27.384s
user	16m23.058s
sys	5m38.090s

sequentiell:

real	13m10.128s
user	10m54.348s
sys	1m53.366s

which would mean that it only took kind of half the time.

If you think that would be an useful enhancement I could create a pull request for it.

@sanjayankur31
Copy link
Owner

Thanks for this @felixschurk . Yes, it would certainly be an enhancement. Please feel free to open a PR and we can refine it before the next release.

I've thought of this before, but as you note, it can be tricky to ensure that it is done correctly. I think we'll have to do something on the lines of:

  • if parallel exists, use parallel compilation, otherwise fall back to single operations
  • parallel returns a non-zero exit status if any of its tasks fail, so we can use that to check if any task failed---but I haven't thought of how we'd figure out which particular task had failed

Finally, if we can generalise this logic as a separate function, it could perhaps also be used for other bits in the script that process multiple files---like the encryption/decryption bits.

What do you think?

@MarkLeakos
Copy link

MarkLeakos commented Jan 23, 2023

From chatgpt:

There are a few ways to determine which particular task failed when using GNU parallel. One way is to run
parallel --joblog

which creates a log file that records the exit status and command of each task run. By looking at the log file, you can determine which command failed and its corresponding exit status.

You can also run:
parallel --tag
which adds the command's arguments as a prefix to the output. This allows you to easily identify the output of each task, and if a task fails, you can identify which command failed by looking at the output.

You can also use the
parallel --bar
option to give an overview of the progress of the commands, and this also indicates which command failed with a red X.

You can also use the
parallel--halt
option with a value, for example --halt 1 that stops parallel execution if any of the commands exit with a non-zero exit status.

If you are using the parallel command inside a shell script, you can use the
PIPESTATUS
variable to check the exit status of each command.
end of chatgpt.

I have no clue whether any of this would be helpful but I thought I'd put it here for you to decide.
Good luck.

@sanjayankur31
Copy link
Owner

Thanks @MarkLeakos : unfortunately, chatGPT is not known for its accuracy, so I'd rather not depend on what it says when it comes to things like this (especially if does not provide references). man parallel seems quite exhaustive, so I'd expect the answer to be in there.

PS: #10 is on my list of things to do, I just have to find the time to work on it :)

@felixschurk
Copy link
Author

Thank you @MarkLeakos, I did not knew before for what exactly I was searching :D But now with the -joblog there is an proper output of what parallel did.

The current PR #10 now produces an output, which can be checked for the failed documents.
I thought it is more desired that parallel continues to work on all files, and does not stop when one gives an error, since usually the documents should be independent.

The -bar I also added since, when there are quite some files to progress, that gives some overview.

@MarkLeakos
Copy link

MarkLeakos commented Jan 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants