Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle unregistered users in BearerTokenAuthMechanism and implement user registration mechanism #10972

Open
wants to merge 54 commits into
base: develop
Choose a base branch
from

Conversation

GPortas
Copy link
Contributor

@GPortas GPortas commented Oct 28, 2024

What this PR does / why we need it:

This PR includes a mechanism for registering new users in Dataverse through a Bearer Token.

BearerTokenAuthMechanism

The Bearer Token authentication functionality (available through feature flag) has been extended to properly handle cases where BearerTokenAuthMechanism successfully validates the token but cannot identify any Dataverse user because there is no account associated with the token. See the current development version of BearerTokenAuthMechanism. (https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/api/auth/BearerTokenAuthMechanism.java#L58).

The BearerTokenAuthMechanism has been extended to return a meaningful error in the scenario described above. Specifically, the calling user would receive a 403 error, indicating that the token is validated but the user is not registered in Dataverse.

Additionally, the entire BearerTokenAuthMechanism class has been refactored, moving authentication and token verification logic to AuthService, since this logic is also utilized by the new user registration functionality included in this PR, and it also results in a simpler version of BearerTokenAuthMechanism, more testable and with more limited responsability.

User registration endpoint

The new endpoint takes the bearer token and a UserDTO object with the required properties to create a user account in Dataverse. It only works if the bearer token feature flag is enabled. A RegisterOidcUserCommand handles the validation of the bearer token and user data, and the account creation process. It can return various errors, including a new InvalidFieldsCommandException, which lists any validation errors in the UserDTO. This exception is now handled by AbstractApiBean like other command exceptions and can be used for future purposes.

Which issue(s) this PR closes:

Special notes for your reviewer:

  • See Handle unregistered users in BearerTokenAuthMechanism and implement user registration mechanism #10972 (comment) to understand why the OIDC UserInfo endpoint is not used in this design

  • The IT for the new registration endpoint is disabled because it depends on the containerized environment, which is not available in Jenkins. Additionally, the feature flag would not be active, so it would fail for this reason as well. Changes have been made to the Keycloak realm JSON to suit the needs of this IT, particularly the modification of roles in the test realm to allow an admin user to create a random test user in the IT setup.

Suggestions on how to test this:

Visual inspection of the API integration tests. Overall, this PR is extensively tested with unit and integration tests, but it is in the ITs where we can see how the endpoint calls work and the different responses that can be obtained.

We can also run this branch locally and reproduce the same calls made in the ITs using curl.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No

Is there a release notes update needed for this change?:

I don't think so. Since this is still under a feature flag.

Additional documentation:

None

This comment has been minimized.

This comment has been minimized.

…earer token auth feature flags compatibility

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@GPortas GPortas removed their assignment Nov 15, 2024

This comment has been minimized.

2 similar comments

This comment has been minimized.

This comment has been minimized.

@qqmyers qqmyers self-assigned this Nov 15, 2024
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall - lots of changes, lots of tests! I made a few suggestions for changes and this probably needs a release note.

);
registerOidcUserResponse.then().assertThat()
.statusCode(UNAUTHORIZED.getStatusCode())
.body("message", equalTo("Unauthorized bearer token."));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we try to avoid string comparisons so that tests don't break when messages change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the error messages are meaningful for the test assertions, as depending on the point where the flow fails, one message or another will be returned, even though the HTTP error code is the same (401) (See the different AuthorizationExceptions that are thrown in

public OIDCUserInfo verifyOIDCBearerTokenAndGetUserIdentifier(String bearerToken) throws AuthorizationException {
).

As far as I know, Dataverse does not implement custom response error codes, so I had to rely on the response error messages for this test. I understand the concern about string comparisons and am open to revisiting this approach if alternative mechanisms (such as custom response error codes) become available in the future.

doc/sphinx-guides/source/api/auth.rst Outdated Show resolved Hide resolved

curl -H "Authorization: Bearer $TOKEN" -X POST http://localhost:8080/api/users/register --data '{"termsAccepted":true}'

It is essential to send a JSON that includes the property ``termsAccepted`` set to true, which indicates that you accept the terms of service of Dataverse. Otherwise, you will not be able to create an account.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What terms? Is it https://guides.dataverse.org/en/latest/api/native-api.html#get-api-terms-of-use-url - if so can that be referenced here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I am referring to the terms of use of the installation, the ones present in the sign-up form and are configurable in the installation.

Screenshot 2024-11-19 at 12 21 37

Screenshot 2024-11-19 at 12 26 11

I have rephrased the text to: "It is essential to send a JSON that includes the property termsAccepted set to true, which indicates that you accept the Terms of Use of the installation. Otherwise, you will not be able to create an account."

I don't think I can provide any links to the terms as they are embedded and not present by default.


It is essential to send a JSON that includes the property ``termsAccepted`` set to true, which indicates that you accept the terms of service of Dataverse. Otherwise, you will not be able to create an account.

In this JSON, we can also include the fields ``position`` or ``affiliation``, in the same way as when we register a user through the Dataverse UI. These fields are optional, and if not provided, they will be persisted as empty in Dataverse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these never available from the token/user info? Should they be covered by the api-bearer-auth-json-claims flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not available, at least by default.

These are not standard OIDC claims, as these properties are specific to the Dataverse domain. They cannot be found in the PersonClaims class of the Nimbus SDK neither.

In my opinion, these properties should not be linked to the api-bearer-auth-json-claims flag, as they represent Dataverse-specific properties that should be provided during sign-up (if desired, since they are optional).

- ``lastName``
- ``emailAddress``

Note that even if they are included in the JSON, if it is possible to retrieve the corresponding claims from the identity provider, these values will be ignored and the ones from the identity provider will be used instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood, in the comments, you suggested that ignoring the values could be confusing. Would it be better to just fail if you send info that is already in the token? (+1 from me)

@@ -3343,6 +3343,12 @@ please find all known feature flags below. Any of these flags can be activated u
* - api-session-auth
- Enables API authentication via session cookie (JSESSIONID). **Caution: Enabling this feature flag exposes the installation to CSRF risks!** We expect this feature flag to be temporary (only used by frontend developers, see `#9063 <https://github.com/IQSS/dataverse/issues/9063>`_) and for the feature to be removed in the future.
- ``Off``
* - api-bearer-auth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: I originally thought that there would be a separate api-bearer-registration flag rather than overloading this flag. Is it always true that if people allow bearer tokens that also want people to be able to create new accounts this way? I'm not sure, but unless someone thinks this use case is real, I'm OK with it as is with one flag.

Copy link
Contributor Author

@GPortas GPortas Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a separate flag. I'm not sure if perhaps you have missed it. There are two flags: api-bearer-auth and api-bearer-auth-provide-missing-claims (the one used in registration when the provider's claims are insufficient). I added both to the docs, as api-bearer-auth was missing from the Feature Flags section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I meant 3 - allow authorizing by token, allow reg by token, allow reg by token with user supplied info. At QDR (a fork) we create accounts via a different mechanism tied to our single sign on mechanism and will eventually want api access via token but not registration via token. As I said above, I don't know that there's a real non-fork use case where other people will want this, so I think it is OK to stay with just the 2 flags you have. We can make a change in the future if needed.

for (OIDCAuthProvider provider : providers) {
try {
// Retrieve both user identifier and user info
Optional<UserRecordIdentifier> userRecordIdentifier = provider.getUserIdentifier(accessToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getUserIdentifier calls getUserInfo at (

) - can the code be refactored to avoid calling it twice? Maybe calling getUserInfo first and then just getUserRecord ?

FWIW: I was concerned that I didn't see the access token being validated anywhere, but since the callback to the userinfo has to succeed for either the getUserIdentifier or getUserInfo to succeed, we know that the token is valid.

userDTO.setEmailAddress(getValueOrDefault(userClaimsInfo.getEmailAddress(), userDTO.getEmailAddress()));
} else {
// Always use the claims provided by the OIDC provider, regardless of whether they are null or not
userDTO.setUsername(userClaimsInfo.getPreferredUsername());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment earlier - should we fail here if you send JSON with these values/with conflicting values?

src/main/java/propertyFiles/Bundle.properties Outdated Show resolved Hide resolved
// If either is present, return the result
if (userRecordIdentifier.isPresent() || userInfo.isPresent()) {
logger.log(Level.FINE, "Bearer token detected, provider {0} confirmed validity and provided user info", provider.getId());
return new OIDCUserInfo(userRecordIdentifier.get(), userInfo.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you return the OAuth2UserRecord from getUserRecord as suggested above, do you have all the info you need? Or could you add the missing fields to it? Versus creating a new aggregator class?

@qqmyers qqmyers assigned GPortas and unassigned qqmyers Nov 18, 2024

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:10959-bearer-token-auth-ext
ghcr.io/gdcc/configbaker:10959-bearer-token-auth-ext

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 30 Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA.Q4.4 OIDC login + API authentication SPA These changes are required for the Dataverse SPA User Role: API User Makes use of APIs
Projects
Status: In Review 🔎
Development

Successfully merging this pull request may close these issues.

Handle unregistered users in BearerTokenAuthMechanism
5 participants