Pipelines
Kerblam! is first and foremost a pipeline runner.
Say that you have a script in ./src/calc_sum.py
. It takes an input .csv
file,
processes it, and outputs a new .csv
file, using stdin
and stdout
.
You have an input.csv
file that you'd like to process with calc_sum.py
.
You could write a shell script or a makefile with the command to run.
We'll refer to these scripts as "pipes".
Here's an example makefile pipe:
./data/out/output.csv: ./data/in/input.csv ./src/calc_sum.py
cat $< | ./src/calc_sum.py > $@
You'd generally place this file in the root of the repository and run make
to execute it.
This is perfectly fine for projects with a relatively simple structure and just one execution pipeline.
Imagine however that you have to change your pipeline to run two different
jobs which share a lot of code and input data but have slightly (or dramatically)
different execution.
You might modify your pipe to accept if
statements, use environment variables
or perhaps write many of them and run them separately.
In any case, having a single file that has the job of running all the different
pipelines is hard, adds complexity and makes managing the different execution
scripts harder than it needs to be.
Kerblam! manages your pipes for you.
You can write different makefiles and/or shell files for different types of
runs of your project and save them in ./src/pipes/
.
When you kerblam run
, Kerblam! looks into that folder, finds (by name) the
makefiles that you've written, and brings them to the top level of the project
(e.g. ./
) for execution.
In this way, you can write your pipelines as if they were in the root of
the repository, cutting down on a lot of boilerplate paths.
For instance, you could have written a ./src/pipes/process_csv.makefile
for
the previous step, and you could invoke it with kerblam run process_csv
.
You could then write more makefiles or shell files for other tasks and run
them similarly, keeping them all neatly separated from the rest of the code.
The next sections outline the specifics of how Kerblam! executes pipes.