Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does sitemap.xml get crawled before the passed URL #40

Open
emersonthis opened this issue Oct 19, 2020 · 0 comments
Open

Why does sitemap.xml get crawled before the passed URL #40

emersonthis opened this issue Oct 19, 2020 · 0 comments
Labels
question Further information is requested

Comments

@emersonthis
Copy link

Ex:

$ lighthouse-parade htts://www.baptistjax.com
Created CSV file
Starting the crawl...
Crawled https://www.baptistjax.com/sitemap.xml [text/xml] (646288 bytes)
Crawled https://www.baptistjax.com/ [text/html; charset=utf-8] (289135 bytes)
Report is done for https://www.baptistjax.com/
Wrote report for https://www.baptistjax.com/
Crawled https://www.baptistjax.com/services [text/html; charset=utf-8] (246368 bytes)
Report is done for https://www.baptistjax.com/services
Wrote report for https://www.baptistjax.com/services
Crawled https://www.baptistjax.com/site-search [text/html; charset=utf-8] (244996 bytes)

Notice that `sitemap.xml' is crawled before the url I requested. Why? Maybe this is some internal logic of simplecrawler?

Possibly related to #3

@emersonthis emersonthis added the question Further information is requested label Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant