Roughly ~10 years ago, CERN had a software problem.
How to distribute the big software stack necessary to analyze and simulate collision data to ~100 datacenter around the world? How to do it while minimizing bandwidth? And how to do it while maximizing performance?
The standard answer to these questions today would be containers. However, 10 years ago containers technology was in its infancy. Moreover, distributing containers images is rather expensive in terms of bandwidth and so in terms of startup time.
CVMFS is a FUSE filesystem, that features lazy-loading, extremely cacheable and developed to distribution of software. The distribution happens over standard HTTP, that allow to leverage the existing infrastructure for caching.
Software is installed once on a single source of truth storage, and it is distributed to (potentially) millions of clients who access it from a read-only filesystem.
Think about all the times you needed a specific compiler, or a specific tool that was not available in the standard package manager. If the software was accessible from an online filesystem you could just use it saving tons of times.
Up to now, CVMFS was deployed mainly in private installation inside big HPC centers. This explain why most developers have never hear of it. However, it is extremely stable software widely used by thousands of users daily powering multiple, large scale, datacenters.
Today we are announcing packages.redbeardlab.com
A public CVMFS installation that includes common Linux utilities and basic software.
- basic linux utilities (binutils, coreutils, bash, zsh, tar)
- a humble selection of compilers (gcc 9, clang 11, go 1.15, rustc 1.45)
- interpreters (python 3.9, python 2.7, lua 5.3)
- software for development (git, automake, cmake, autoconf, autogen, bison, flex)
- databases (postgres, mysql, redis)
- editors (neovim, emacs)
- common linux utilities (curl, htop, iotop, jq, lua, ripgrep, time, tmux, wget, zip)
In order to get started, is necessary to install the CVMFS client and to setup it correctly.
The CVMFS client can be found on the official homepage: cernvm.cern.ch/fs
It is possible to install it either as DEB or RPM package.
# using yum sudo yum install https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm sudo yum install -y cvmfs # using apt sudo apt-get install lsb-release wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb sudo dpkg -i cvmfs-release-latest_all.deb rm -f cvmfs-release-latest_all.deb sudo apt-get update sudo apt-get install -y cvmfs
After the installation of the CVMFS client, it is necessary to set up CVMFS.
All the configuration can be found on this tarbal. And it can be installed with:
curl -L https://github.com/RedBeardLab/packages.redbeardlab.com/releases/latest/download/packages.redbeardlab.com.config.tar | sudo tar -x -C /etc
To check that everything works,
cvmfs_config probe should return OK.
As an alternative it is possible to run the CVMFS client inside a docker container and expose the
/cvmfs mount point to the host.
docker run -d \ --device /dev/fuse \ --volume /cvmfs:/cvmfs:shared \ --volume /var/lib/cvmfs:/var/lib/cvmfs \ --privileged \ redbeardlab/packages
Please notice how both the tarball and the docker images are generated by a CI script.
After the installation it is possible to navigate the CVMFS repository using standard Linux tools.
Moreover, a README.md (
/cvmfs/package.redbeardlab.com/README.md) will guide you on how to use the repository.
The simplest thing to do, is to invoke:
This will set up your system to use all the software installed as fallback. So only if fyou don’t have the same software installed in the local system.
The main use case of CVMFS is for latency insensitive workload, it works out of the box for CI scripts, or for running long-lasting servers.
Running a service from CVMFS is usually faster than installing the packages from repository and the running it.
What CVMFS is not well suited for is interactive use case. If you need to invoke bash, then awk, then grep, then python, then ruby, then jq, etc, etc… waiting in front of the terminal, then CVMFS will be look slow.
Unless the data is already in the local cache, in such case the performance difference is negligible with the respect of the local filesystem.
Fortunately, CVMFS automatically manages a local LRU cache.
When a file is downloaded from the CVMFS server, it is automatically stored in the local filesystem, when the same file is requested again, the local copy is used.
Suggested use cases
packages.redbeardlab.com targets developers use cases.
Enhancing local workstations
The suggestion would be to put
/cvmfs/packages.redbeardlab.com/bin at the end of the $PATH in local workstations.
This will not disturb the local workflow, but it will provide all the software installed on packages.redbeardlab.com on-demand.
It can be very useful when you want to try software without actually installing it.
Sometimes it is necessary to quickly run some application, either you can pull down a docker container or invoke it from /cvmfs/packages.redbeardlab.com
Very often this is the case for compilers. Need to test your software on different compilers or different compilers versions? Just invoke a different compiler.
The first and time-consuming steps of most CI is about installing all the necessary dependency for building and testing your application.
Long Running Servers
It can make sense to deploy also long running services on top of CVMFS. This is especially true if it is necessary to spin up a cluster of several machines, all with the same software.
Using the software installed
Once CVMFS is running, the last step is to actually use the software installed.
The simplest thing to set up a working system is to invoke:
$ source /cvmfs/packages.redbeardlab.com/setup.sh
In alternative, it is possible to open an issue with this github repo.
It is possible to add software to
/cvmfs/packages.redbeardlab.com, this will allow everybody to use it and benefit from it.
Or opening an issues agains the Github Repository.
While we are running the infrastructure, the complexity behind this project is huge. It was possible to tame all this complexity only thanks to very solid software and with the help of very solid internet business.
Definitely a big thanks to CERN and CVMFS, that has developed software of outstanding quality.
A big thanks to the Nix project that allow us to simply bootstrap a complete software stack (from the compilers up to k8s) in a reasonably quick and simple way.
Then, we are using backblaze and their S3 API to store all the data managed by CVMFS.
DO NOT rely on this for business critical needs. Please, get in touch, if your business need a similar solution.