Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize crawls based on 40x vs 50x vs 20x / 30x #101

Open
brendanheywood opened this issue Dec 17, 2019 · 0 comments
Open

Prioritize crawls based on 40x vs 50x vs 20x / 30x #101

brendanheywood opened this issue Dec 17, 2019 · 0 comments

Comments

@brendanheywood
Copy link
Contributor

If a url has been already crawled and it was a 20x or 30x then we know it was good, and we are simply checking for regressions.

If it was a 40x then it is either a regression we want detected which might possibly be fixed on the other end, but more likely it is just the link itself which is broken and needs to be fixed.

50x on the other hand are temporary and a clear signal that we should try again and get a different response.

So suggesting that the previous return code should be a soft weighting factor in the queue prioritization so that 50x are done earlier to try and clear them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant