Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dml_write_exception thrown from parse_html() breaks crawl process #131

Closed
pennedav opened this issue Mar 26, 2020 · 2 comments
Closed

dml_write_exception thrown from parse_html() breaks crawl process #131

pennedav opened this issue Mar 26, 2020 · 2 comments
Labels

Comments

@pennedav
Copy link
Contributor

This is an issue I alluded to in issue #127:

I had a case where the crawler just seemed to get stuck. Turns out it was because of a dml_write_exception being thrown from within crawl() in classes/robot/crawler.php. In my case it was because there was a page having a link that was longer than 1033 characters (the size of the url field).

I'm not sure the best solution to this. At some point you have to have a field size limit and 1033 seems big enough, but the Moodle Wiki seems to cause issues in this case. There are some links that exceed that amount resulting in the underlying dml_write_exception.

I will submit a patch which at least captures these exceptions, logs the error using mtrace(), then continues processing other URLs in the queue. This is good enough for me at this point, however, ideally there'd be some way of logging this information where a report could be pulled by the administrator along with the broken links report, but that's beyond the amount of time I have to work on this right now.

@brendanheywood
Copy link
Contributor

@pennedav we've recently refactored these tables to use a persistent, and importantly we no longer directly query based on the url, but a hash of the url, which means we should be able to just swap this from a char to a text field of random length. Will you still be submitting a patch?

@pennedav
Copy link
Contributor Author

@brendanheywood FYI I've been laid off from my employer, for whom I was working on this. Feel free to use the patch as-is or reject the pull as you see fit. Sounds like your refactoring will address the issue anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants