Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo manifest & build system needed #692

Open
o-smirnov opened this issue Dec 11, 2020 · 9 comments
Open

Cargo manifest & build system needed #692

o-smirnov opened this issue Dec 11, 2020 · 9 comments
Assignees

Comments

@o-smirnov
Copy link
Collaborator

The current situation around builds and tags is a little unsatisfactory. Images are built and tagged by hand, and there's no way to ensure that (for any given version of Stimela), a consistent set of images is built and pushed to the registry -- we must Just Have Faith that one of the developers has done it correctly. It gets even messier now that we're allowing multiple versions and multiple tags to be shipped.

It's also not trivial to capture the particular image versions used for any given pipeline run (which we need for our promised reproducibility!)

I propose something along the following lines:

  • add a file called (cargo) manifest.yml which lists the image versions (and, optionally, Dockerfiles) that constitute the current state of Stimela. E.g.
base:
  1.6.0: base/base            # implying cargo/base/base/Dockerfile is to be used
cubical:
  1.5.7: base/cubical/Dockerfile.1.5.7   # non-default Dockerfile to build an older version
  1.5.8: base/cubical
  • add a stimela build command that rebuilds everything according to the current manifest file

  • remove image tags from the parameters.json of individual cabs, unless multiple versions need to be shipped. Default behaviour should be just to use the version from the manifest.

  • manifest.yml is to be read at startup into an in-memory manifest dict.

  • we already have an API in place that we use from Caracal to ask Stimela to use non-default images. This call will then modify the in-memory manifest dict.

  • add an API call e.g. stimela.get_manifest() that returns this dict. Caracal can store this in its output directory.

  • add an API call e.g. stimela.set_manifest() to use a non-default manifest. This can then be used to "reproduce" historical runs.

@o-smirnov o-smirnov self-assigned this Dec 11, 2020
@gigjozsa
Copy link
Collaborator

Following this the version in parameters.json and the image tags would be identical to the release tags of the given software. How do we make sure that there are no unintended accidents and two images get the same tag? Is this checked automatically in the docker repo or should we add a check (chksum)?

@o-smirnov
Copy link
Collaborator Author

No, the registry is libertarian. It will always accept an upload and let you overwrite any given image.

But before you upload, you have to build. I think the right thing to do is to check for an existing image locally with docker image inspect, and refuse to rebuild unless --force is specified.

Which reminds me:

  • add labels to images during the stimela build process, so that information such as "built by USER on HOST on DATE/TIME" can be retrieved from the image

@gigjozsa
Copy link
Collaborator

In addition, would it be useful to introduce the convention of a tag with 4 numbers, the first three reflecting the version number of the software, the last the actual build, starting at 0?

Then add a stimela imbuild command, which is a docker build but checks for the existing tags, such that any time someone rebuilds an image, the tags in the manifest get updated automatically?

stimela imbuild cabname w.x.y:

  • Get current tag w.x.y.z from manifest. If image not exists, use w.x.y.0
  • If it exists get the image w.x.y.z from registry if it exists
  • Build new image with docker build and tag it tmp
  • If tmp is not identical to w.x.y.z retag it w.x.y.z+1 and update manifest

No hassle with images in registry being updated with the same name?

@o-smirnov
Copy link
Collaborator Author

Hmm, my first take is that four numbers starts to look overcomplicated. What is the use case for multiple incremental builds of the same underlying package version? If something was broken in the previous image, then surely you ought to be overwriting that image with a working one. And if nothing was broken, why are you rebuilding?

Note that there's nothing wrong with tacking on an ah hoc suffix onto the version, i.e. 1.6.8-b1, should some unusual case arise.

(And not all software uses x.y.z versioning anyway...)

@gigjozsa
Copy link
Collaborator

You are right. The solution is indeed to stick to the release numbers in the manifest. The suggestion was coupled to the fact that so far no version numbers of the software used have been recorded. That meant that the images could not be tracked in the sense that they could be different at different times.

Btw., if at all possible the version of the software version(s) should probably be evident from the Dockerfiles, right?

@o-smirnov
Copy link
Collaborator Author

Btw., if at all possible the version of the software version(s) should probably be evident from the Dockerfiles, right?

Good question. Yes, when we pip install from a repo (see e.g. the cubical cab). Not if it's a KERN package though. Then it becomes a question of e.g. "what is the wsclean version provided by KERN-6 at time of build". Which is not exactly transparent or ideal, and I'm not sure how to solve it yet...

@SpheMakh
Copy link
Collaborator

The version of the software is in the parameters.json file.

@o-smirnov
Copy link
Collaborator Author

The version of the software is in the parameters.json file.

Yep, but it's only there because one of the devs put it in by hand, right? I'm trying to figure out if there's a way to check this automatically in the image build process. At the moment, if say @Athanaseus uploads a new version of pybdsf to KERN-6 (which I know he shouldn't, but in principle, he could, and it's outside of Stimela's control), and you rebuild the image from the Dockerfile, suddenly you have a new version of pybdsf in there, and you don't even know it's changed unless you happen to do a little digging.

Not a problem for pip installs from github, since we generally use an @vX.Y.Z tag for that.

@o-smirnov
Copy link
Collaborator Author

See discussion thread: #696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants