Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Reworking the grant_revoke_gem_authority script #47

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jeremyf
Copy link
Contributor

@jeremyf jeremyf commented Jun 11, 2020

Prior to this commit, we relied on Github usernames from Samvera to
dictate who had access to Rubygems.org. I believe this created the
following problem:

First, we needed branching logic. Some github accounts had an
email/username that did not correspond to their Rubygems email/username.
Some github accounts had no corresponding Rubygems email/usernames.

This implied a somewhat confusing relationship that, in my opinion,
created more confusing code.

Second, I'm opting to leave the samvera-depcrated gems alone. You can
run WITH_DEPRECATED=true to update those gems.

Third, I'm attempting a single request that both adds and revokes
authors. The reason being that I encountered throttling issues when I
ran the WITH_REVOKE using the prior logic. I also added a sleep
directive so as to be a bit kinder against an unclear throttling
threshold.

Fourth, Valkyrie is now under the Samvera aegis, so I've removed it from
the special-case list.

Fifth, I have added a bit of progress reporting, namely printing to
stdout the current gem that the script is processing.

Sixth, I have encountered Rubygems throttling issues. Rubygems has a 10
requests per second rate limit (see docs). I thought at first we
could add all of the '-a' and '-r' switches in one request. However,
looking at the implementation, the owner command calls the API
once per switch.

Please note, I have created the
data/definitive-rubygems-authors-list.txt to the best of my
ability. I'm certain I may have missed people.

Prior to this commit, we relied on Github usernames from Samvera to
dictate who had access to Rubygems.org.  I believe this created the
following problem:

First, we needed branching logic. Some github accounts had an
email/username that did not correspond to their Rubygems email/username.
Some github accounts had no corresponding Rubygems email/usernames.

This implied a somewhat confusing relationship that, in my opinion,
created more confusing code.

Second, I'm opting to leave the samvera-depcrated gems alone. You can
run WITH_DEPRECATED=true to update those gems.

Third, I'm attempting a single request that both adds and revokes
authors.  The reason being that I encountered throttling issues when I
ran the WITH_REVOKE using the prior logic.  I also added a sleep
directive so as to be a bit kinder against an unclear throttling
threshold.

Fourth, Valkyrie is now under the Samvera aegis, so I've removed it from
the special-case list.

Fifth, I have added a bit of progress reporting, namely printing to
stdout the current gem that the script is processing.

Sixth, I have encountered Rubygems throttling issues. Rubygems has a 10
requests per second rate limit ([see docs][1]). I thought at first we
could add all of the '-a' and '-r' switches in one request. However,
looking at the [implementation][2], the `owner` command calls the API
once per switch.

Please note, I have created the
`data/definitive-rubygems-authors-list.txt` to the best of my
ability. I'm certain I may have missed people.

[1]:https://guides.rubygems.org/rubygems-org-api/#rate-limits
[2]:https://github.com/rubygems/rubygems/blob/6edf24739fe76455af82491e539abbbbfd87b065/lib/rubygems/commands/owner_command.rb#L90
@jeremyf jeremyf changed the title Reworking the grant_revoke_gem_authority script WIP - Reworking the grant_revoke_gem_authority script Jun 11, 2020
@jeremyf
Copy link
Contributor Author

jeremyf commented Jun 11, 2020

I marked this as WIP because I need to again run the script (I hit throttling limits and may have been temporarily blocked). However, I would like others to review this if they have capacity.

@jeremyf
Copy link
Contributor Author

jeremyf commented Jun 12, 2020

I have attempted to run the author add/revoke script this morning, but again encountered what appears to be 429 HTTP errors. I may be on a naughty list.

@jeremyf jeremyf requested review from bess and mjgiarlo June 12, 2020 12:21
@mjgiarlo
Copy link
Member

Happy to review on Monday. Thanks for working on this!

Copy link
Member

@mjgiarlo mjgiarlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments and questions for you to consider. Thanks for taking this beast on.

Comment on lines +2 to +3
# gems owners. Each line must either be either an email address
# of the associated Rubygems profile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be a typo in this second sentence. How about:

Suggested change
# gems owners. Each line must either be either an email address
# of the associated Rubygems profile.
# gems owners. Each line must either be a rubygems.org email address
# or username.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend removing this address as @carolyncole no longer works at Penn State. If she still works w/ Samvera-y things, she can supply her new rubygems.org username/email address, I reckon. (Hi, Carolyn! Wishing you well!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with this one. Maybe someone else can vouch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this one, and revoke his access to our gems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you doing Samvera-y things, @jenlindner? (And, hi!)

Comment on lines +80 to +81
File.expand_path("../../data/definitive-rubygems-authors-list.txt", __FILE__)
).split("\n").map { |line| line.sub(/ *#.*$/, "") }.select { |line| !line.empty? }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this, but a different way to deal with this is to store the list of gem authors as a simple yaml file, and then the YAML parser will deal with the empty lines and comments for us. Something like:

an advantage of this approach is we can validate the YAML in CI (which I see this project doesn't have... but it could?).

# TODO: We are twice calling `gem owner` for information. We could
# rework the logic to only call it once.
sleep(GEM_SLEEP_INTERVAL)
if system("gem owner #{name} > /dev/null 2>&1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what is the purpose of this call? To flag failures for stashing in @bogus_gem_names?

def gem_owner_with_error_check(gemname, params)
command = "gem owner #{gemname} #{params}"
puts "running: #{command}" if VERBOSE
OWNERSHIP_ACTIONS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for declaring constants together near the top of a file but I'm not sure this file already respects this preference, so 🤷‍♂️

@gem_names.reject { |gemname| FALSE_POSITIVES.include?(gemname) }.sort.each do |gemname|

@gem_names.sort.each do |gemname|
$stdout.puts "Updating #{gemname}..."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


committers_to_add = committer_emails - current_committers - SKIP_EMAILS
add_params = committers_to_add.map { |email| "-a #{email}" }.join(' ')
gem_owner_with_error_check(gemname, add_params)
end

if @bogus_gem_names.any?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth spitting this out. If not, can also remove an extra gem owner call above, possibly? It seemed a good idea at some point, buuuuuut...

@jrochkind
Copy link

This is messy, I appreciate you taking a stab at it!

One thing to note though -- I think one effect of the code before mapping from github username to rubygems email address, is that if someone was removed as a github committer, they would be automatically removed as a rubygems releaser on next sync?

I could be wrong about this -- is this true? If so, giving it up may be "dangerous". While this relies on someone running the sync script (in such a way that it removes, not just adds, authorized releasers), it provides some protection from forgetting to remove a rubygems owner who is no longer a committer.

Under this code, even if someone is removed as a github committer, they won't be removed as a rubygems releaser unless they are manually removed from the new definitive-rubygems-authors-list.txt file.

This seems to make it a lot easier to have someone removed as a committer who is accidentally still a rubygems releaser. In some sense rubygems releaser is even more power than github committer. It gives you the power to release a gem that isn't even from this repo at all, that has whatever code you want in it. There could be a malicious person who used to be a samvera dev; more likely, someone who used to (or is) a samvera dev with a compomised account.

Having to maintain the github=>rubygems email mapping meant that if you forgot to maintain the mapping, you could add someone as a committer and they would not get rubygems release ability, which is annoying, they'd have to figure out that they need to go add themselves to the mapping too. But the new way of doing things means if you forget to maintain the mapping, you could have someone remaining a rubygems releaser that should not be, an arguably worse failure mode.

Or am I worrying unnecessarily or misunderstanding? Thoughts?

@mjgiarlo
Copy link
Member

This is messy, I appreciate you taking a stab at it!

One thing to note though -- I think one effect of the code before mapping from github username to rubygems email address, is that if someone was removed as a github committer, they would be automatically removed as a rubygems releaser on next sync?

I could be wrong about this -- is this true? If so, giving it up may be "dangerous". While this relies on someone running the sync script (in such a way that it removes, not just adds, authorized releasers), it provides some protection from forgetting to remove a rubygems owner who is no longer a committer.

Under this code, even if someone is removed as a github committer, they won't be removed as a rubygems releaser unless they are manually removed from the new definitive-rubygems-authors-list.txt file.

This seems to make it a lot easier to have someone removed as a committer who is accidentally still a rubygems releaser. In some sense rubygems releaser is even more power than github committer. It gives you the power to release a gem that isn't even from this repo at all, that has whatever code you want in it. There could be a malicious person who used to be a samvera dev; more likely, someone who used to (or is) a samvera dev with a compomised account.

Having to maintain the github=>rubygems email mapping meant that if you forgot to maintain the mapping, you could add someone as a committer and they would not get rubygems release ability, which is annoying, they'd have to figure out that they need to go add themselves to the mapping too. But the new way of doing things means if you forget to maintain the mapping, you could have someone remaining a rubygems releaser that should not be, an arguably worse failure mode.

Or am I worrying unnecessarily or misunderstanding? Thoughts?

I've run this script many, many times and I don't think I've ever bothered to run a revocation, FWIW. (I have manually revoked folks' access on a onesy-twosy basis in the past.)

@jrochkind
Copy link

jrochkind commented Jun 15, 2020

I've run this script many, many times and I don't think I've ever bothered to run a revocation, FWIW. (I have manually revoked folks' access on a onesy-twosy basis in the past.)

I guess the question is just how much y'all are concerned about accidentally leaving releasers that shouldn't be. Past practice was clearly not perfect here (on several grounds), but is it a problem to make a commit that makes it harder to get right? And does this do that?

Up to you all, I just raise the question. One reason I raise the question is there HAS been a rash of malicious gem releases in rubygems lately.

More disturbingly, a community-related gem seems to have gotten a malicious release, although not one from the samvera org itself -- aasm-active-fedora -- oops, aasm-active-fedora was was a typosquat, an underscore for a hyphen, not actually a compromised release. Still made me think about vulnerability of samvera gems. https://blog.reversinglabs.com/blog/mining-for-malicious-ruby-gems

All it takes is a compromise to any rubygems account in this list, and the compromiser could do malicious releases of any samvera gem. Which is enough to make one a bit nervous about a giant list of authorized releasers at all, but also made me worry about increasing the attack area by accidentally leaving accounts as releasers that aren't even rubygems committers aymore.

@jrgriffiniii
Copy link
Contributor

jrgriffiniii commented May 11, 2022

As this was still a WIP pull request, I am going to convert this to a draft.

@jrgriffiniii jrgriffiniii marked this pull request as draft May 11, 2022 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

5 participants