Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10 x time limit in failure log entries #980

Open
galbaras opened this issue Sep 5, 2023 · 10 comments
Open

10 x time limit in failure log entries #980

galbaras opened this issue Sep 5, 2023 · 10 comments
Labels
good first issue The issue is a good candidate for the first community contribution/for a newcomer to the team. priority: normal The issue/PR is normal priority—not many people are affected or there’s a workaround, etc. type: enhancement The issue is a request for an enhancement.

Comments

@galbaras
Copy link

galbaras commented Sep 5, 2023

After setting the queue runner time limit to 600, failed actions show the message "action marked as failed after 6000 seconds. Unknown error occurred. Check server, PHP and database error logs to diagnose cause."

This is caused by Line 229 in ActionScheduler_Abstract_QueueRunner.php, which says

$this->cleaner->clean( 10 * $this->get_time_limit() );

This gives the impression that the action timed out after a much longer time than it likely did and should probably be adjusted to reflect the actual time limit used.

@barryhughes
Copy link
Member

Hi @galbaras,

It doesn't feel inaccurate to me (in these cases, the action has indeed been marked as failed after 6,000 seconds have elapsed) but perhaps we could tweak the language to make it less ambiguous. For example:

This action was set to a status of in-progress at least 6000 seconds ago. There has been no further change and this suggests it has stopped running, therefore the status will now be updated to failed.

Alternatively, we could attempt to time the actual action duration and use PHP shutdown functions to handle this in situations where execution times out ... but, that is a larger effort and I'm a little unsure how valuable it would be to most users.

What do you think—and are there scenarios you can share where more accurate figures would be useful?

@barryhughes barryhughes added the needs feedback The issue/PR needs a response from any of the parties involved in the issue. label Sep 6, 2023
@galbaras
Copy link
Author

galbaras commented Sep 6, 2023

The cleaner doesn't seem to run after 10x the time. This figure is just hardcoded with no other reference I could find that would explain why. Since I sent the time limit to 600, wouldn't the task have died after 600 seconds already?

Basically, the log should help with troubleshooting as much as possible, and I'm seeing nothing relevant in my error logs, which is totally weird. Increasing the time limit seemed to help, although the log previous quoted "300", so I doubled it, but really multiplied by 20, because of this 10x factor in the code.

@barryhughes
Copy link
Member

The cleaner doesn't seem to run after 10x the time.

Right, but it doesn't reflect when the cleaner itself runs: instead it is used to form the cutoff time when the cleaner queries for stalled actions.

Put another way: all we are really trying to communicate is that we've come across an action that was set to in-progress, but then hasn't seen any further updates in the last cutoff seconds. The value of the cutoff being configurable (6,000 seconds as in your initial posts).

@barryhughes
Copy link
Member

Since I sent the time limit to 600, wouldn't the task have died after 600 seconds already?

No ... if we're talking about the same time limit, this is the number of seconds in which we can continue to process batches of actions. However:

A) Many actions may be processed in a single batch.
B) The time limit is assessed after each batch.

So, conceivably (and depending on your server config), a single action could run for a much longer time than specified for the batch time out.

@galbaras
Copy link
Author

galbaras commented Sep 7, 2023

Got it. Thank you.

@galbaras galbaras closed this as completed Sep 7, 2023
@barryhughes
Copy link
Member

@galbaras—though you closed this, I think you may be right that the message is not as clear as it could be (if you found it confusing, probably others will).

Would the alternative text I posted in my initial reply be clearer? Or, any other wording—perhaps if we just drop the reference to the number of seconds?

Action marked as failed: an unknown error occurred, or the action may have timed out. Check server, PHP and database error logs to diagnose the cause.

@galbaras
Copy link
Author

galbaras commented Sep 7, 2023

Mentioning the check interval is actually helpful, as it was in my case. How about this:

action checked after 6000 seconds and was no longer running. Check server, PHP and database error logs for possible causes

The "failed" status already indicates that it's failed, so that bit is redundant, and if the error is unknown, why say it? It provides no additional information.

While we're on this, I've found nothing in the PHP logs, I'm on a shared server, so no access to database logs or server logs (unless you mean the access log). PHP normally catches issues, including timeouts, memory exceptions, query errors and database connection issues, but the actions I've been looking at (Rank Math data fetches from Google) seem to leave no trace. Can this be related to how they are triggered?

@galbaras galbaras reopened this Sep 7, 2023
@barryhughes
Copy link
Member

Thanks!

While we're on this, I've found nothing in the PHP logs

Yep, there's no guarantee that the logs will contain useful information—they may contain nothing of interest at all (but, there may be cases where they do).

@barryhughes
Copy link
Member

Rank Math data fetches from Google

Could be worth flagging with the Rank Math team? They can probably provide guidance on further debugging in relation to the actual action code they added.

@galbaras
Copy link
Author

galbaras commented Sep 8, 2023

They just suggested using the Action Scheduler debug addon, which kills performance 😞

@barryhughes barryhughes added type: enhancement The issue is a request for an enhancement. priority: normal The issue/PR is normal priority—not many people are affected or there’s a workaround, etc. good first issue The issue is a good candidate for the first community contribution/for a newcomer to the team. and removed needs feedback The issue/PR needs a response from any of the parties involved in the issue. labels Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue The issue is a good candidate for the first community contribution/for a newcomer to the team. priority: normal The issue/PR is normal priority—not many people are affected or there’s a workaround, etc. type: enhancement The issue is a request for an enhancement.
Projects
None yet
Development

No branches or pull requests

2 participants