-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialize data version control for managing test images #1036
Conversation
Adding dvc package to environment.yml and running `dvc init` to get the barebones .dvcignore, .dvc/config & .dvc/.gitignore files.
ea5444a
to
7e0940c
Compare
Should this update |
We don't have a requirements-dev.txt file anymore as of #812 😄. I wanted to put it in environment.yml but ran into some CI issues, might need to finish off #1033 first. Oh, and will need to document how to use dvc somewhere too. I'm still getting my head around how to make it as easy as possible for everyone. |
Ok, I've added a "Using data version control (dvc)" subsection to CONTRIBUTING.md at 6bd7ba9. Have a read, see if it makes sense, and maybe try and test this Pull Request out! Feel free to ask any questions too. |
Sorry, had to restart the regular GMT6.1.1/Python 3.9 test, I tried triggering the GMT dev version test at #1036 (comment), and looks like test_logo*.py failed (see https://github.com/GenericMappingTools/pygmt/actions/runs/662613609) because |
d56a31e
to
c37bdff
Compare
I saw you just migrated test_image.png to dvc. We also need to update the test_image.py script:
|
My last concern is, how can reviewers review the image changes? When someone runs |
No need to do a |
I just added a new image to my testing branch (commit b8bdb7c in #1071). However, I can't preview the image in DAGsHub: https://dagshub.com/GenericMappingTools/pygmt/src/fix-test-basemap-polar/pygmt/tests/baseline |
We may also need to update the maintenance guides about how to review PRs with dvc images. We can do it after we have more experience with the new workflow. |
Will let the full tests run before merging 🚀, thanks for the review and testing things out thoroughly!
Yep, will do that in a follow up PR. I'll post a comment on #963 to summarize the new workflow for everyone else on the team. |
…ingTools#1036) Using a data version control package called [`dvc`](https://github.com/iterative/dvc) to manage the PNG test images in the PyGMT repo! In a nutshell, store only the hash of the PNG on GitHub (in a *.png.dvc file), while having the actual PNG stored on DAGsHub at https://dagshub.com/GenericMappingTools/pygmt. * Initialize data version control Adding dvc package to environment.yml and running `dvc init` to get the barebones .dvcignore, .dvc/config & .dvc/.gitignore files. * Set dvc remote as https://dagshub.com/GenericMappingTools/pygmt.dvc * Temporarily installing dvc using pip instead of conda to make CI work * Refactor test_logo to use mpl_image_compare and track png files in dvc * Add dvc pull as a step in ci_tests.yaml to pull in data * List files in pygmt/tests/baseline/ to see what happens after dvc pull * Do `dvc pull` before `pip install dist/*` otherwise test PNGs aren't there * First draft of instructions for using dvc to store baseline images * Instruct to do `git push` first and then `dvc push` Technically the order shouldn't matter, but most tutorials seem to use `git push` first so follow that. * New checklist item for maintainers to get added to DAGsHub dvc remote * Move pygmt/tests/baseline/.gitignore to top-level * Clarify that `git rm -r --cached` only needs to run during migration * Try installing dvc from conda again now that there is a Py3.9 package * Install dvc and do `dvc pull` on GMT dev tests too * Refactor test_logo tests to be simpler and more unit-test like * Mention dvc status command to see which files need staging * Update test_image to use SI units and long aliases Co-authored-by: Dongdong Tian <[email protected]>
Description of proposed changes
Using a data version control package called
dvc
to manage the PNG test images in our repository!In a nutshell, we will only store the hash of the PNG on GitHub (in a *.png.dvc file) and the actual PNG will be stored at https://dagshub.com/GenericMappingTools/pygmt (see e.g. https://dagshub.com/GenericMappingTools/pygmt/src/data_version_control/pygmt/tests/baseline/test_logo_on_a_map.png).
List of commands ran so far on this 'data_version_control' branch (will update as I test things out):
Background (only needed to be done once for this repo)
Installing DVC for developing PyGMT
DVC init
Setup DVC remote
When a new test is being written (Everyone will need to run)
DVC remote authentication config (changes .dvc/config.local)
Generate new baseline PNG images
Push images to DVC remote
Pull PNG images from DVC remote (needed for Github Actions CI)
Note: Some of the
dvc
commands may not be necessary if wedvc install
some git hooks so thatdvc
add/checkout/push actions happen simply withgit
add/checkout/push, but need to try this out a bit more. One forseeable con with this solution is thatdvc
may (?) run even for commits where no PNG images change, so an extra layer of slowness.References:
Addresses #963
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.Slash Commands
You can write slash commands (
/command
) in the first line of a comment to performspecific operations. Supported slash commands are:
/format
: automatically format and lint the code/test-gmt-dev
: run full tests on the latest GMT development version