Packaging workflows for later

The kerblam package command is one of the most useful features of Kerblam! It allows you to package everything needed to execute a workflow in a docker container and export it for execution later.

As with kerblam run, this is chiefly useful for those times where the workflow manager of your choice does not support such features, or you do not wish to use a workflow manager.

You must have a matching dockerfile for every workflow that you want to package, or Kerblam! won't know what to package your workflow into.

For example, say that you have a process pipe that uses make to run, and requires both a remotely-downloaded remote.txt file and a local-only precious.txt file.

If you execute:

kerblam package process --tag my_process_package

Kerblam! will:

  • Create a temporary build context;
  • Copy all non-data files to the temporary context;
  • Build the specified dockerfile as normal, but using this temporary context;
  • Create a new Dockerfile that:
    • Inherits from the image built before;
    • Copies the Kerblam! executable to the root of the container;
    • Configure the default execution command to something suitable for execution (just like kerblam run does, but "baked in").
  • Build the docker container and tag it with my_process_package;
  • Export all precious data, the kerblam.toml and the --tag of the container to a process.kerblam.tar tarball.

The --tag parameter is a docker tag. You can specify a remote repository with it (e.g. my_repo/my_container) and push it with docker push ... (or podman) as you would normally do.

tip

If you don't specify a --tag, Kerblam! will name the result as <pipe>_exec.

Replaying packaged projects

After Kerblam! packages your project, you can re-run the analysis with kerblam replay by using the process.kerblam.tar file:

kerblam replay process.kerblam.tar ./replay_directory

Kerblam! reads the .kerblam.tar file, recreates the execution environment from it by unpacking the packed data, and executes the exported docker container with the proper mountpoints (as described in the kerblam.toml file).

In the container, Kerblam! fetches remote files (i.e. runs kerblam data fetch) and then the workflow is triggered via kerblam run. Since the output folder is attached to the output directory on disk, the final output of the workflow is saved locally.

These packages are meant to make workflows reproducible in the long-term. For day-to-day runs, kerblam run is much faster.

important

The responsibility of having the resulting docker work in the long-term is up to you, not Kerblam! For most cases, just having kerblam run work is enough for the resulting package made by kerblam package to work, but depending on your docker files this might not be the case. Kerblam! does not test the resulting package - it's up to you to do that. It's best to try your packaged workflow once before shipping it off.

However, even a broken kerblam package is still useful! You can always enter with --entrypoint bash and interactively work inside the container later, manually fixing any issues that time or wrong setup might have introduced.

Kerblam! respects your choices of execution options when it packages, changing backend or working directory as you'd expect. See the kerblam.toml specification to learn more.