-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy and user agent #14
Comments
It depends on the browser and session you are using. User AgentIf you are using chromote (the default), you can use Network.setUserAgentOverride: session <- selenider_session()
session$driver$Network$setUserAgentOverride(
userAgent = "My user agent"
) If you are using Selenium and Chrome: session <- selenider_session(
"selenium",
browser = "chrome",
options = selenium_options(
client_options = selenium_client_options(
capabilities = list(
"goog:chromeOptions" = list(
args = list("user-agent=My user agent")
)
)
)
)
) Selenium and Firefox: session <- selenider_session(
"selenium",
browser = "firefox",
options = selenium_options(
client_options = selenium_client_options(
capabilities = list(
"moz:firefoxOptions" = list(
prefs = list(
"general.useragent.override" = "My user agent"
)
)
)
)
)
) You can get the current user agent with: execute_js_expr("return navigator.userAgent") Proxy ServerNote that I haven't tested these, and used these two issues as reference: In chromote, you have to pass arguments to the Chrome process using chromote::set_chrome_args(c(
chromote::get_chrome_args(),
"--proxy-server=HOST:PORT"
))
session <- selenider_session() With Selenium and Chrome: session <- selenider_session(
"selenium",
browser = "chrome",
options = selenium_options(
client_options = selenium_client_options(
capabilities = list(
"goog:chromeOptions" = list(
args = list("--proxy-server=HOST:PORT")
)
)
)
)
) Selenium and Firefox: session <- selenider_session(
"selenium",
browser = "firefox",
options = selenium_options(
client_options = selenium_client_options(
capabilities = list(
"moz:firefoxOptions" = list(
prefs = list(
"network.proxy.type" = 1,
"network.proxy.socks" = "HOST",
"network.proxy.socks_port" = PORT,
"network.proxy.socks_remote_dns" = FALSE
)
)
)
)
)
) I might support this feature directly in selenider in the future. |
@ashbythorpe Thank you so much for your exhaustive answer; very valuable information. For static websites I am using the approach below; it is simple, easy and straightforward. It would be great to have such thing also in selenider for dynamic websites.
I was trying to reproduce the code above for Selenider and Chromote using hints from you.
I was less lucky with the proxy implemention. Besides IP address and port, I am forced to provide also credentials, username and password. I did not find any documentation how to do this, nor in links you provided me in your post above, nor here (reference from Chromote project page): https://peter.sh/experiments/chromium-command-line-switches/ |
Hello @ashbythorpe , I did some more testing of changing the user agent however with unsatisfactory results; please see below session <- selenider::selenider_session("chromote", timeout = 10)
session <- session$driver$Network$setUserAgentOverride(
userAgent = "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0; MDDCJS)"
)
response <- selenider::open_url("https://www.r-project.org/")
browser_user_agent <- response$driver$Browser$getVersion()
browser_user_agent$userAgent
#> [1] "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/121.0.6167.86 Safari/537.36" Created on 2024-02-09 with reprex v2.0.2 |
I managed to get proxy authentication to work with chromote: library(selenider)
chromote::set_chrome_args(c(
chromote::default_chrome_args(),
"--proxy-server=HOST:PORT"
))
session <- selenider_session()
authenticate <- function(x) {
id <- x$requestId
response <- list(
response = "ProvideCredentials",
username = "USERNAME",
password = "PASSWORD"
)
session$driver$Fetch$continueWithAuth(
requestId = id,
authChallengeResponse = response
)
}
continue_request <- function(x) {
id <- x$requestId
session$driver$Fetch$continueRequest(requestId = id)
}
session$driver$Fetch$enable(
patterns = list(
list(urlPattern = "*")
),
handleAuthRequests = TRUE
)
session$driver$Fetch$requestPaused(
callback_ = continue_request
)
session$driver$Fetch$authRequired(
callback_ = authenticate
) You essentially need to intercept every request that needs authentication, hence why the code is quite complicated. This will also cause a warning every time you navigate to a new webpage, since right now you can't use .enable methods manually (see rstudio/chromote#144). I'll probably add this as an explicit option to |
Weird, it seems Browser.getVersion gives a different value to JavaScript's library(selenider)
session <- selenider_session()
session$driver$Network$setUserAgentOverride(
userAgent = "My user agent"
)
#> named list()
execute_js_expr("return navigator.userAgent")
#> [1] "My user agent"
session$driver$Browser$getVersion()$userAgent
#> [1] "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/121.0.6167.161 Safari/537.36" Hopefully this means that Page.getVersion returns an outdated user agent, rather than Network.setUserAgentOverride not working. EDIT: puppeteer uses Network.setUserAgentOverride to set the user agent and Page.getVersion to get the user agent, hence why this issue is relevant. |
@ashbythorpe , thank you for your code, it´s quite complex :). Unfortunately I wasn´t able to get it work. When I ran the code with reprex(), I got the web page load timed out, no clue why. Please see below. If I ran this code manually then # Code of @ashbythorpe START
library(selenider)
#> Warning: package 'selenider' was built under R version 4.3.2
chromote::set_chrome_args(c(
chromote::default_chrome_args(),
"--proxy-server=69.58.9.215:7285"
))
session <- selenider::selenider_session()
authenticate <- function(x) {
id <- x$requestId
response <- list(
response = "ProvideCredentials",
username = "myusername",
password = "mypassword"
)
session$driver$Fetch$continueWithAuth(
requestId = id,
authChallengeResponse = response
)
}
continue_request <- function(x) {
id <- x$requestId
session$driver$Fetch$continueRequest(requestId = id)
}
session$driver$Fetch$enable(
patterns = list(
list(urlPattern = "*")
),
handleAuthRequests = TRUE
)
#> named list()
session$driver$Fetch$requestPaused(
callback_ = continue_request
)
session$driver$Fetch$authRequired(
callback_ = authenticate
)
# Code of @ashbythorpe END
selenider::open_url("https://www.myip.com/")
#> Error: Chromote: timed out waiting for event Page.loadEventFired
selenider::s("#ip") |> selenider::elem_text()
#> Error in `selenider::elem_text()`:
#> ! To get the text inside `x`, it must exist.
#> ℹ After 4 seconds, `x` was not present. Created on 2024-02-11 with reprex v2.1.0 |
Hi @rcepka, Example 1: using proxy.pyproxy.py only supports HTTP requests, so the proxy server connection is successful but the IP accessed by HTTPS websites is wrong. In the command line: In R: library(selenider)
chromote::set_chrome_args(c(
chromote::default_chrome_args(),
"--proxy-server=127.0.0.1:8899"
))
session <- selenider_session()
x <- session$driver$Fetch$requestPaused(
callback_ = function(x) NULL
)
session$driver$Fetch$disable()
#> named list()
authenticate <- function(x) {
id <- x$requestId
response <- list(
response = "ProvideCredentials",
username = "username",
password = "password"
)
session$driver$Fetch$continueWithAuth(
requestId = id,
authChallengeResponse = response
)
}
continue_request <- function(x) {
id <- x$requestId
session$driver$Fetch$continueRequest(requestId = id)
}
session$driver$Fetch$enable(
patterns = list(
list(urlPattern = "*")
),
handleAuthRequests = TRUE
)
#> named list()
session$driver$Fetch$requestPaused(
callback_ = continue_request
)
session$driver$Fetch$authRequired(
callback_ = authenticate
)
open_url("http://api.ipify.org/")
elem_text(s("*"))
# my local IP
open_url("https://api.ipify.org/")
elem_text(s("*"))
# my local IP However, while the IP is wrong, the logs on the command line (with excess information removed) show us that we are connecting to the proxy server:
Notably, we get the exact same result if we do this without authentication. Now, we use a random proxy server from https://free-proxy-list.net/. We choose one that supports HTTPS. library(selenider)
chromote::set_chrome_args(c(
chromote::default_chrome_args(),
"--proxy-server=167.86.115.218:8888"
))
session <- selenider_session()
open_url("http://api.ipify.org/")
elem_text(s("*"))
#> [1] "206.217.216.17"
open_url("https://api.ipify.org/")
elem_text(s("*"))
#> [1] "206.217.216.17" This IP is different from So yeah, I think the problem is most likely that your proxy server does not support HTTPS requests. |
Hello,
how can I implement proxy and user agent please?
The text was updated successfully, but these errors were encountered: