Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Drupal UrlScraper #1790

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TravisCarden
Copy link

@TravisCarden TravisCarden commented Jul 29, 2022

This updates Drupal 7 and 8 and adds Drupal 9 and 10.

  • Updated the versions and releases in the scraper file
  • Ensured the license is up-to-date and that the documentation's entry in the array in about_tmpl.coffee matches its data in self.attribution
  • Ensured the icons and the SOURCE file in public/icons/your_scraper_name/ are up-to-date if the documentation has a custom icon
  • Ensured self.links contains up-to-date urls if self.links is defined
  • Tested the changes locally to ensure:
    • The scraper still works without errors
    • The scraped documentation still looks consistent with the rest of DevDocs
    • The categorization of entries is still good

@TravisCarden
Copy link
Author

I'm trying to test locally, and I'm getting the error below--before my changes and after them. Can anyone help me debug it?

thor docs:generate "Drupal@7" --debug
/!\ WARNING /!\

Some scrapers send thousands of HTTP requests in a short period of time,
which can slow down the source site and trouble its maintainers.

Please scrape responsibly. Don't do it unless you're modifying the code.

To download the latest tested version of this documentation, run:
  thor docs:download Drupal@7

Proceed? (y/n) y
Queue:   api.drupal.org/api/drupal/7.x
Queue:   api.drupal.org/api/drupal/groups/7.x
Queue:   api.drupal.org/api/drupal/groups/7.x?page=1
ERROR:
  https://api.drupal.org/api/drupal/7.x
  RuntimeError: Error status code (0): URL using bad/illegal format or missing URL
    https://api.drupal.org/api/drupal/7.x



  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:49:in `process_response?'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:158:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:77:in `block in build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:59:in `block (2 levels) in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `each'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `block in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/instrumentable.rb:15:in `instrument'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:57:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:18:in `run'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:38:in `request_all'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:76:in `build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:115:in `block in store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block (2 levels) in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:182:in `track_touched'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:170:in `lock'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:144:in `open_yield_close'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:30:in `open'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:114:in `store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs.rb:100:in `generate'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:303:in `generate_doc'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:105:in `generate'

ERROR:
  https://api.drupal.org/api/drupal/groups/7.x
  RuntimeError: Error status code (0): URL using bad/illegal format or missing URL
    https://api.drupal.org/api/drupal/groups/7.x



  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:49:in `process_response?'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:158:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:77:in `block in build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:59:in `block (2 levels) in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `each'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `block in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/instrumentable.rb:15:in `instrument'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:57:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:18:in `run'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:38:in `request_all'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:76:in `build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:115:in `block in store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block (2 levels) in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:182:in `track_touched'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:170:in `lock'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:144:in `open_yield_close'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:30:in `open'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:114:in `store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs.rb:100:in `generate'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:303:in `generate_doc'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:105:in `generate'

ERROR:
  https://api.drupal.org/api/drupal/groups/7.x?page=1
  RuntimeError: Error status code (0): URL using bad/illegal format or missing URL
    https://api.drupal.org/api/drupal/groups/7.x?page=1



  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:49:in `process_response?'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:158:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:77:in `block in build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:59:in `block (2 levels) in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `each'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `block in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/instrumentable.rb:15:in `instrument'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:57:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:18:in `run'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:38:in `request_all'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:76:in `build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:115:in `block in store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block (2 levels) in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:182:in `track_touched'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:170:in `lock'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:144:in `open_yield_close'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:30:in `open'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:114:in `store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs.rb:100:in `generate'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:303:in `generate_doc'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:105:in `generate'

Failed!

@simon04
Copy link
Contributor

simon04 commented Jul 31, 2022

It seems that the structure of the Drupal docs has been changed. Please revise item 4 from https://github.com/freeCodeCamp/devdocs/blob/main/docs/adding-docs.md

The following patch allowed me to scrape v7 docs:

diff --git a/lib/docs/filters/drupal/entries.rb b/lib/docs/filters/drupal/entries.rb
index 9da70441..b0c99d91 100644
--- a/lib/docs/filters/drupal/entries.rb
+++ b/lib/docs/filters/drupal/entries.rb
@@ -20,7 +20,7 @@ module Docs
         elsif subpath =~ /core!themes/
           'themes'
         else
-          css('.breadcrumb > a')[1].content
+          css('.breadcrumb a')[1].content
         end
       end
 
diff --git a/lib/docs/scrapers/drupal.rb b/lib/docs/scrapers/drupal.rb
index 3798caec..96cca5e9 100644
--- a/lib/docs/scrapers/drupal.rb
+++ b/lib/docs/scrapers/drupal.rb
@@ -10,7 +10,7 @@ module Docs
     html_filters.push 'drupal/entries', 'drupal/clean_html', 'title'
 
     options[:decode_and_clean_paths] = true
-    options[:container] = '#page-inner'
+    options[:container] = '#page'
     options[:title] = false
     options[:root_title] = 'Drupal'

@TravisCarden
Copy link
Author

Thank you, @simon04, I've applied your patch. I still get the same runtime error, though, running thor docs:generate "Drupal@7". Since it works for you, I assume it's something about my local setup. Should I/we try to debug it? Or if it works for you, is that good enough to move forward?

@TravisCarden
Copy link
Author

I'm going to go out on a limb and mark this ready for review, unanswered question notwithstanding. 🙂

@TravisCarden TravisCarden marked this pull request as ready for review August 30, 2022 22:02
@TravisCarden TravisCarden requested a review from a team as a code owner August 30, 2022 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants