Containerized Execution of workflows
Kerblam! can ergonomically run workflow managers inside containers for you, making it easier to be reproducible.
This is very useful for those cases where you just need a small bash script to run your analysis, but still wish to be reproducible. If you use more feature-rich workflow managers, they can handle containers for you, making Kerblam-handled containers less useful.
If Kerblam! finds a container recipe (such as a Dockerfile) of the same name
as one of your workflows in the ./src/dockerfiles/
folder
(e.g. ./src/dockerfiles/process_csv.dockerfile
for the
./src/workflows/process_csv.makefile
workflow),
it will use it automatically when you execute the workflow manager for that
workflow (e.g. kerblam run process_csv
) to run it inside a container.
Kerblam! will do something similar to this (for an example makefile):
- Copy the workflow file to the root of the directory (as it does normally when you
launch
kerblam run
), as./executor
; - Run
docker build -f ./src/dockerfiles/process_csv.dockerfile --tag process_csv_kerblam_runtime .
to build the container; - Run
docker run --rm -it -v ./data:/data --entrypoint make process_csv_kerblam_runtime -f /executor
.
This last command runs the container, telling it to execute make
with
target file -f /executor
.
Note that it's not exactly what kerblam does - it has additional features
to correctly mount your paths, capture stdin
and stdout
, etc...
meaning that it works transparently with your other settings and profiles.
If you have your docker container COPY . .
, you can then effectively have
Kerblam! run your projects in docker environments, so you can tweak your
dependencies and tooling (which might be different than your dev environment)
and execute even small analyses in a protected, reproducible environment.
Kerblam! will build the container images without moving the recipies around
(this is what the -f
flag does).
The .dockerignore
in the build context (next to the kerblam.toml
) is shared
by all pipes.
See the 'using a dockerignore' section
of the Docker documentation for more.
You can write dockerfiles for all types of workflows. Kerblam! configures automatically the correct entrypoint and arguments to run the pipe in the container for you.
Read the "writing dockerfiles for Kerblam!" section to learn more about how to write dockerfiles that work nicely with Kerblam! (spoiler: it's easier than writing canonical dockerfiles!).
Listing out available workflows
If you run kerblam run
without a workflow (or with a non-existant workflow),
you will get the list of available workflows.
You can see at a glance what workflows have an associated dockerfile as they
are prepended with a little whale (🐋):
Error: No runtime specified. Available runtimes:
🐋◾ my_workflow :: Generate the output data in a docker container
◾◾ local_workflow :: Run some code locally
Available profiles: No profiles defined.
Default dockerfile
Kerblam! will look for a default.dockerfile
if it cannot find a container
recipe for the specific pipe (e.g. pipe.dockerfile
), and use that instead.
You can use this to write a generalistic dockerfile that works for your
most simple workflows.
The whale (🐋) emoji in the list of pipes will be replaced by a fish (🐟) for
pipes that use the default container, so you can identify them easily:
Error: No runtime specified. Available runtimes:
🐋◾ my_workflow :: Generate the output data in a docker container
🐟◾ another :: Run in the default container
Available profiles: No profiles defined.
Switching backends
Kerblam! runs containers by default with Docker, but you can tell it to use
Podman instead by setting the execution > backend
option in your kerblam.toml
:
[execution]
backend = "podman" # by default "docker"
Podman is slightly harder to set up but has a few benefits, mainly not having
to run in root mode, and being a FOSS program.
For 90% of usecases, you can use podman
instead of docker
and it will
work exactly the same.
Podman and Docker images are interchangeable, so you can use Podman with
dockerhub with no issues.
Setting the container working directory
Kerblam! does not parse your dockerfile or add any magic to the calls that it makes based on heuristics. This means that if you wish to save your code not in the root of the container, you must tell kerblam! about it.
For instance, this recipe copies the contents of the analysis in a folder
called "/app
":
COPY . /app/
This one does the same by using the WORKDIR
directive:
WORKDIR /app
COPY . .
If you change the working directory, let Kerblam! know by setting the
execution > workdir
option in kerblam.toml
:
[execution]
workdir = "/app"
In this way, Kerblam! will run the containers with the proper paths.
important
This option applies to ALL containers managed by Kerblam!
There is currently no way to configure a different working directory for every specific dockerfile.
Skipping using cache
Sometimes, you want to skip using the build cache when executing a workflow with a container executable.
Using kerblam run my_workflow --no-build-cache
will do just that: the
build backend will be told not to use the cached layers for that build (with
the --no-cache
flag).
Example
For example, consider a makefile named process_csv.makefile
that uses a
Python script to process CSV files. You could have the following Dockerfile:
# ./src/dockerfiles/process_csv.dockerfile
FROM ubuntu:latest
RUN apt-get install python, python-pip && \
pip install pandas
COPY . .
and this dockerignore file:
# ./src/dockerfiles/.dockerignore
.git
data
venv
and simply run kerblam run process_csv
to build the container and run
your code inside it.