Tutorial
I assume you've already gone through the install and configure instructions. Let's initialize a bulker config file for this tutorial:
rm "bulker_config.yaml"
export BULKERCFG="bulker_config.yaml"
bulker init -c $BULKERCFG
rm: cannot remove 'bulker_config.yaml': No such file or directory
Guessing container engine is docker.
Wrote new configuration file: bulker_config.yaml
Terminology
Let's start with a few terms:
-
crate. A collection of containerized executables. A crate is analogous to a docker image (but it provides multiple commands by pointing to multiple images).
-
manifest. A manifest defines a crate. It is a list of commands and images to be included in the crate. A manifest is analogous to a Dockerfile. It could be thought of as a Cratefile.
-
load. Loading a manifest will create a local folder with executables for each command in the manifest. Loading a manifest is analogous to building or pulling an image.
-
activate. Activating a crate is what allows you to run the commands in a crate. Activating is analogous to starting a container. Any loaded crates are available to activate. Activating a crate does nothing more than prepend the crate folder to your
PATH
variable.
The executables are created upon load
, and merely added to your PATH upon activate.
Therefore, if you adjust settings in your bulker config (such as VOLUMES or ENVVARS), you'll need to re-load the manifests to update the actual bulker containerized executables. As of version 0.7.0
, you can do this easily with bulker reload
upon a config file change.
Loading crates
I assume you've followed the instructions to install and configure bulker. Next, type bulker list
to see what crates you have available. If you've not loaded anything, it should be empty:
bulker list
Bulker config: /home/nsheff/code/bulker/docs_jupyter/bulker_config.yaml
Available crates:
No crates available. Use 'bulker load' to load a crate.
Let's load a demo crate. There are a few ways to load a manifest: either from a bulker registry, or directly from a file.
Using a bulker registry
Here's a manifest that describes 2 commands:
manifest:
name: demo
version: 1.0.0
commands:
- command: cowsay
docker_image: nsheff/cowsay
docker_command: cowsay
docker_args: "-i"
- command: fortune
docker_image: nsheff/fortune
docker_command: fortune
This manifest is located in the bulker registry, under the name bulker/demo
. Here 'bulker' is the namespace (think of it as the group name) and 'demo' is the name of the crate to load. Since 'bulker' is the default namespace, you can load it like this:
bulker load demo
Bulker config: /home/nsheff/code/bulker/docs_jupyter/bulker_config.yaml
Loading manifest: 'bulker/demo:default'. Activate with 'bulker activate bulker/demo:default'.
Commands available: cowsay, fortune
Doing bulker load bulker/demo:default
would do the same thing. That's how you load any crate, from any namespace, from the registry.
Loading crates from a file
You can also load any manifest by pointing to the yaml file with the -f
argument:
bulker load demo -f http://big.databio.org/bulker/bulker/demo.yaml
Here, the registry path ('demo') indicates to bulker what you want to name this crate. You can name it whatever you want, since you're loading it directly from a file and not from the registry...so you can do bulker load myspace/mycrate -f /path/to/file.yaml
.
Once you've loaded a crate, if you type bulker list
you should see the demo
crate available for activation. But first, let's point out the -b
argument, which you can pass to bulker load
. By default, all bulker load
does is create a folder of executables. It does not actually pull or build any images. Docker will automatically pull these by default as soon as you use them, which is nice, but you might rather just grab them all now instead of waiting for that. In this case, just pass -b
to your bulker load
command:
bulker load demo -b -f
Bulker config: /home/nsheff/code/bulker/docs_jupyter/bulker_config.yaml
Building images with template: /home/nsheff/code/bulker/docs_jupyter/templates/docker_build.jinja2
Removing all executables in: /home/nsheff/bulker_crates/bulker/demo/default
Using default tag: latest
latest: Pulling from nsheff/cowsay
Digest: sha256:14fa1f533678750afd09536872e068e732ae4f735c52473450495d5af760c2e3
Status: Image is up to date for nsheff/cowsay:latest
docker.io/nsheff/cowsay:latest
Docker image available as: nsheff/cowsay
Using default tag: latest
latest: Pulling from nsheff/fortune
Digest: sha256:a980b4b333a8b89acf4c2fe90dde5da93898ab574a6d2e88152398724667957b
Status: Image is up to date for nsheff/fortune:latest
docker.io/nsheff/fortune:latest
Docker image available as: nsheff/fortune
Loading manifest: 'bulker/demo:default'. Activate with 'bulker activate bulker/demo:default'.
Commands available: cowsay, fortune
Now, bulker will instruct docker (or singularity) to pull all the images required for all the executables in this crate. (The -f
just forces an overwrite without prompting). Now we can see it in our available local crates:
bulker list
Bulker config: /home/nsheff/code/bulker/docs_jupyter/bulker_config.yaml
Available crates:
bulker/demo:default -- /home/nsheff/bulker_crates/bulker/demo/default
Updating crates with config changes
One final point. Let's say you make a change to your configuration file, say, by adding a new volume or environment variable. Now, you'd like this change to be reflected in your activated crate. But just typing bulker activate ...
after changing the config file will not be enough, because the config changes were read and populated at the bulker load
stage.
Therefore, when you update your config file, you'll need to re-load any affected crates. Bulker makes this easy with a bulker reload
command.
Running commands using bulker crates
Once you have loaded a crate, all it means is there's a folder somewhere on your computer with a bunch of executables. You can use it like that if you like, by just running these commands directly. For example, the demo crate by default will create the following path: '$HOME/bulker_crates/bulker/demo/default/cowsay'. You can execute this by including the full path:
$HOME/bulker_crates/bulker/demo/default/cowsay boo
_____
< boo >
-----
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
This example demonstrates how simple and flexible bulker is under the hood. But using commands like this is cumbersome. It simplifies things if you add these commands to your PATH
, plus, then you can more easily use sets of commands as a kind of controlled computational environment. Bulker provides two ways to do this conveniently, depending on your use case: bulker activate
, and bulker run
.
-
activate. This will add all commands from a given crate to your PATH and give you a terminal where you can use them. You want to use activate if you want to manage crates like namespaces that you can turn on or off. This is useful for controlling which software versions are used for which tasks, because the manifest controls the versions of software included in a crate.
-
run. This will run a single command in a new environment that has a crate prepended to the PATH.
Try it out with this command:
First, we'll activate the new environment. On your command line, you just need to type bulker activate demo
. In Jupyter, since bulker provides a new shell, we have to use this eval
workaround, but you can ignore the complexity if you're not using jupyter:
eval $(bulker activate demo -p -e)
Bulker config: /home/nsheff/Dropbox/env/bulker_config/zither.yaml
Activating bulker crate: demo
We can now use bulker inspect
, which will show us that we've activated the bulker/demo
manifest and have cowsay
and fortune
available:
bulker inspect
Bulker config: /home/nsheff/Dropbox/env/bulker_config/zither.yaml
Crate path: /home/nsheff/bulker_crates/bulker/demo/default
Bulker manifest: bulker/demo
Available commands: ['cowsay', 'fortune']
Here's where the magic happens: let's pipe the output of fortune
to cowsay
:
fortune | cowsay
________________________________________
/ The true Southern watermelon is a boon \
| apart, and not to be mentioned with |
| commoner things. It is chief of the |
| world's luxuries, king by the grace of |
| God over all the fruits of the earth. |
| When one has tasted it, he knows what |
| the angels eat. It was not a Southern |
| watermelon that Eve took; we know it |
| because she repented. |
| |
| -- Mark Twain, "Pudd'nhead Wilson's |
\ Calendar" /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
The advantage of bulker over vanilla containers
On the surface, this seems the same as running this command in a container that includes both fortune and cowsay. Indeed, the user experience is pretty similar. What separates this process from a typical container use is that our command is actually not running in a container, but in the host shell, and using two commands that each run in separate containers. There is no container that contains both fortune
and cowsay
; instead, we have individual containers for each command, and then wrapped each command in an executable. Both of these commands are in our PATH because they're both included in the crate.
Activating multiple crates
You can also pass a comma-separated list of crates to either run
or activate
, which will merge executables from two different crates. This is not practical using vanilla containers because it requires you to build a new container that contained the software from both containers, which would eliminate the advantages of modularity and increase container bloat and disk use.
As an example, let's load another demo crate that adds a new command pi
, which prints out pi
to many digits. We can get our cow to quote these pi definitions by activating both of these crates.
bulker load pi -b -f
Bulker config: /home/nsheff/code/bulker/docs_jupyter/bulker_config.yaml
Building images with template: /home/nsheff/code/bulker/docs_jupyter/templates/docker_build.jinja2
Using default tag: latest
latest: Pulling from nsheff/pi
Digest: sha256:6187416a85719fb42bcd4e4c62ffce3b5757c2d17813090cadbd9f4eeb9c9425
Status: Image is up to date for nsheff/pi:latest
docker.io/nsheff/pi:latest
Docker image available as: nsheff/pi
Loading manifest: 'bulker/pi:default'. Activate with 'bulker activate bulker/pi:default'.
Commands available: pi
Now try running a command that requires commands from two different crates:
Again, you can use bulker activate pi,demo
if in a shell, or use this eval
code in a jupyter notebook:
eval $(bulker activate pi,demo -p -e)
Bulker config: /home/nsheff/Dropbox/env/bulker_config/zither.yaml
Activating bulker crate: pi,demo
pi | cowsay
_________________________________________
/ 3.1415926535897932384626433832795028841 \
| 971693993751058209749445923078164062862 |
\ 08998628034825342117067 /
-----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
So, outside of jupyter you'd write:
bulker activate pi,demo
pi | cowsay
Just to make sure you realize what's happening here and why this is so cool: this is not a command running in a single container. In fact, the command itself is running in the host shell, and the pipe (|
) is handled by the host shell. The two executables, pi
and cowsay
, are each being run within their own modular containers that do only one thing. And, each of these commands are located in different crates, which are activated simultaneously.
Conclusion
That's basically it. If you're a workflow developer, all you need to do is write your own manifest and distribute it with your workflow; in 3 lines of code, users will be able to run your workflow using modular containers, using the container engine of their choice.