Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blacklisting titles #52

Open
papuass opened this issue Apr 8, 2021 · 1 comment
Open

Blacklisting titles #52

papuass opened this issue Apr 8, 2021 · 1 comment

Comments

@papuass
Copy link

papuass commented Apr 8, 2021

This might be a controversial issue but clearly, there is a problem with this in the Wikimedia stats.
Usually, these cases can be detected by looking at mobile/desktop page view proportion (one of them heavily dominates).

Currently, there is an article [1] that gets quite constant 800 desktop hits daily [2] and that is enough to get in lvwiki top3 most days. In the past, there was some pr0n site article doing the same thing in enwiki.

[1] https://lv.wikipedia.org/wiki/Karless_Pud%C5%BEdemons
[2] https://pageviews.toolforge.org/?project=lv.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-20&pages=Karless_Pud%C5%BEdemons

Should there be a user maintained blacklist for hatnote top?

@slaporte
Copy link
Member

We excluded one page from enwiki based on signs of inauthentic traffic—I think that's the other site you mentioned. See get_data.py#L77. We could add more pages to the list where the traffic looks artificial or the page doesn't belong.

If we factor it out as a user-maintained blacklist, I think it would be good to explain our exclusion methodology for the sake of transparency. I'd also love if there was a way to automatically flag (and possibly exclude?) pages based on signs of bot traffic, like an unusual pattern of mobile/desktop traffic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants