-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Error during OAuth callback / missing constraints for role_membership
table
#3811
Comments
SQL query hint:
It looks like a UNIQUE constraint on |
For future reference: While looking a little closer at logs when testing a fix for this, I noticed that in one case, there were two overlapping OAuth requests happening at the same time, processed by different pods. Both requests had a separate auth code and state, i.e. both requests were part of separate OAuth flows. In order to sync groups, Aleph first clears all current group memberships, then inserts new group memberships based on the data in the OAuth token retrieved from the ID provider. However, Aleph does this in a single transaction, so I wouldn’t expect this to be the issue. |
Some additional context:
I think the issue is that while deleting and inserting the role memberships is part of a single transaction, determining which role memberships to delete is not. When clearing role memberships for a given user, SQLAlchemy doesn’t construct a query like this: DELETE FROM role_membership WHERE member_id = 123 Instead, it deletes specific rows based on data read before: DELETE FROM role_membership WHERE member_id = 123 AND group_id = 456
DELETE FROM role_membership WHERE member_id = 123 AND group_id = 789 If a user is added to a new group, the following race condition can occur when two overlapping OAuth requests are handled:
As a result, the role membership for Group C is stored twice. |
This likely isn't the root cause for #3811, but it will help with debugging in the future.
While this doesn't fix the root cause for #3811, it ensures that this error doesn't result in inconsistent data that have previously permanently blocked users from sign-in. Having sensible constraints on tables is a good idea anyway.
* Make name/email customizable in OAuth mocks * Fix OAuth test setup This was an oversight and should have always called `super().setUp()` instead of `super().setUpClass()`. This didn’t fail when running tests for the entire file (`pytest test_sessions_api.py`), but it did always fail when running individual tests (`pytest test_sessions_api.py -k oauth_callback`). * Test OAuth group sync * Log warning when trying to add duplicate role membership This likely isn't the root cause for #3811, but it will help with debugging in the future. * Add primary key constraint on `role_membership` table While this doesn't fix the root cause for #3811, it ensures that this error doesn't result in inconsistent data that have previously permanently blocked users from sign-in. Having sensible constraints on tables is a good idea anyway.
Describe the bug
Some users experience a 500 error when signing in using OAuth.
The
role_membership
table stores which groups a user is part of. It has two columns,group_id
androle_id
(migration). However, it doesn’t define any constraints (composite primary key or a unique constraint, so it is possible to add duplicate rows to the table.Duplicate rows in this table can cause errors during the OAuth callback. After authenticating a user, Aleph "syncs" the groups the user is part of. It does this by first deleting all rows in the
role_membership
for the user, then adding new rows for all of the groups the users is currently part of. Groups are encoded in the OAuth token returned by the identity provider, e.g. Keycloak.In case of duplicate rows, there’s a mismatch between the number of rows SQLAlchemy expects to delete and the number of rows that are actually deleted. This results in the following error:
To Reproduce
Steps to reproduce the behavior:
role_membership
table, you should see one row representing the group membership set up in step 2.Aleph version
3.17.0 and 4.0.0-rc33
Additional context
The text was updated successfully, but these errors were encountered: