-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GDrive remote support #2551
GDrive remote support #2551
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks neat! I see that it is still WIP and has some things like commented-out code, but decided to do a quick glimpse over for now 🙂
Thanks, for the PR, I started using it and it works great. However, I guess the following might be considered (@Maxris ping me for help if you want), ordered by severity:
|
Hey @janchorowski , great findings! 2 and 3 sounds reasonable. I'm looking for the moment at point 3 to figure out the best way to configure OAuth2 token to pass |
@janchorowski I'm going to address things you have mentioned above as next step. Will just grab your changes from https://github.com/janchorowski/dvc/blob/2ef9a4c0a03923567c86df81be833eeb96f71b7d/dvc/remote/gdrive/__init__.py#L38-L52 and give them a try locally. thanks! |
Thank you, proposed solution seems works great so far! I'm going to do more testing with higher amount of files.
According to the doc from the link you have mentioned (https://rclone.org/drive/#making-your-own-client-id) "The default Google quota is 10 transactions per second". Since DCV sends Intermediate directories are already cached and their remote IDs are retrieved from runtime cache collection ( Could you please let me know if you can see here the way to optimize number of API queries further.
Going to create docs for this. |
PR should be ready for the review. So far known issues:
I might will implement above just after docs PR creation and add to current PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Please see some minor comments up above.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
df69e62
to
e1ed0b1
Compare
9f6d469
to
b21a1b8
Compare
This comment has been minimized.
This comment has been minimized.
if remote_ids: | ||
return remote_ids[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this deterministic between runs? Or will it return a random id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to find relevant info about order of returned results by Drive API v2.
https://developers.google.com/drive/api/v2/reference/files/list#parameters
Above link says for optional orderBy
param: Please note that there is a current limitation for users with approximately one million files in which the requested sort order is ignored.
And it seems DVC can't rely on usage of orderBy
because of above.
So far I haven't noticed that results might be returned in different order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't run into millions, may just sort/take minimum one instead of first to make it deterministic. I will add it to #2865
@Maxris Please don't forget to install your git hooks, so that restyled doesn't have to create all of those annoying PR's each time 🙂 |
ef35012
to
e722789
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! For the record, #2865 is for the rest of TODOs.
Approved! Thanks, @Maxris 🎉 |
@shcheklein this conversation seems happened there #2551 (comment) and already added to the #2865 under number 2 in the list of To-Dos. |
@Maxris kk, I see. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Maxris, this was big.
Added a couple of points to #2865 |
Have you followed the guidelines in our
Contributing document?
Does your PR affect documented changes or does it add new functionality
that should be documented? If yes, have you created a PR for
dvc.org documenting it or at
least opened an issue for it? If so, please add a link to it.
Docs update will be tracked under iterative/dvc.org#381
User should create locally GDrive settings file (format expected by PyDrive) and put his/here own Google Project credentials into new created settings file. For test purposes following settings file content might be used:
User should configure dvc repo with GDrive remote and path to settings file created in previous step as following:
Above configuration will push dvc files to the new created directory
test
located at user's GDriveroot
directory.Implementation details:
PyDrive
library is used to access Google Drive API v2. To handle and avoid getting API usage limits errorsratelimit
andbackoff
libraries are used.PYDRIVE_USER_CREDENTIALS_DATA
can be used to store user account's token data to skip manual login action. Should be useful for automated tested, but the actual value ofPYDRIVE_USER_CREDENTIALS_DATA
is supposed to be encrypted and secured from unauthorized usage.