-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset Edit Performance Improvements #10890
Draft
qqmyers
wants to merge
44
commits into
IQSS:develop
Choose a base branch
from
GlobalDataverseCommunityConsortium:DANS_Performance2
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Dataset Edit Performance Improvements #10890
qqmyers
wants to merge
44
commits into
IQSS:develop
from
GlobalDataverseCommunityConsortium:DANS_Performance2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
it apparently causes new datafiles, which don't have create dates to start to be persisted, cause a failure with an sql insert to the dataset table with a null createdate. Moving it back to this location assures the datafiles have create dates and avoids the exception. It's not clear to me why trying to get the authenticatedUser in the updateDatasetUser() call causes this.
CacheFactoryBeanTest.testAuthenticatedUserGettingRateLimited:171 expected: <120> but was: <122>
qqmyers
force-pushed
the
DANS_Performance2
branch
from
October 9, 2024 18:21
7c46bfc
to
3329636
Compare
qqmyers
force-pushed
the
DANS_Performance2
branch
from
October 17, 2024 17:27
f18d569
to
ad3a582
Compare
qqmyers
force-pushed
the
DANS_Performance2
branch
from
October 18, 2024 18:40
2910fdf
to
49f48ef
Compare
previously gave a null encountered in unit of work clone error
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it: These PR includes multiple changes to the UpdateDatasetVersionCommand to improve the performance/scalability when editing dataset with large numbers of files. Key changes include:
Which issue(s) this PR closes:
Closes #10138
Special notes for your reviewer: In my testing on a dataset with 10K files, the time required for the UpdateDatasetVersionCommand in the DatasetPage.save() method to complete (as measured by logging in the save method) when a one char change to the description was made was averaging ~30 seconds. With all the changes in the PR, it now takes ~12-13 seconds. In general, verifying the impact of individual changes is hard:
That said, I would estimate that the first two changes contribute ~4 second reductions each (the feature flag would save 12 seconds, but the differencing PR saves ~ 8 seconds there).
Suggestions on how to test this: All the automated tests should pass, any/all variants of making changes to a dataset should work as before, there should be no changes w.r.t. the db-level updates except for the change to not update datafile lastmodified dates. Performance should be improved overall and scaling should be improved. The simplest way to test that might be to turn on fine logging for the DatasetPage where I've added logging of the time to run the update command. (Note that the overall time seen in the UI includes both the time to save the changes and the time to reload the page. The latter, with 10K files is still many seconds and hasn't been improved in this PR.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: Probably one for any/all performance updates going into 6.5 along with announcing the feature flag and change to file last modified behavior.
Additional documentation: to be added