Skip to content

5. Advanced topics

5.1 Automatic deployment of CernVM-FS servers and clients

As you may have experienced during this tutorial, it takes quite a bit of manual effort to deploy all the different CernVM-FS components, and you can easily make mistakes. Therefore, we strongly recommend to automate this in a production setup with a tool like Ansible or Puppet.

For Ansible, you could take a look at the playbooks of the EESSI project, which use the Ansible role from the Galaxy Project to install and configure both servers and clients. Compute Canada also offers an Ansible role to configure CernVM-FS clients, and a demo release of an Ansible role for Stratum servers.

CERN offers its own Puppet module that allows you to install and configure CernVM-FS servers and clients.

5.2 Debugging issues

If you are experiencing issues with your CernVM-FS setup, there are various ways to start debugging. Most issues are caused by wrongly configured clients (either a configuration issue, or a wrong public key) and connection or firewall issues.

5.2.1 Debugging with cvmfs_config on the client machine

In order to find the cause of the issue, you should first find out where the issue is being caused. You can start by checking the configuration on your client machine by running:

sudo cvmfs_config chksetup

This should print OK.

To make sure that your configuration is really picked up and set correctly (because of the hierarchical structure of the configuration, it is possible that some parameter gets overwritten by another configuration file), you can dump the effective configuration for your repository:

cvmfs_config showconfig repo.organization.tld

Make sure that at least CVMFS_HTTP_PROXY and CVMFS_SERVER_URL are set correctly, and that the directory pointed to by CVMFS_KEYS_DIR really contains the (correct) public key file.

The probe subcommand can be used for (re)trying to mount the repository, and should print OK:

$ cvmfs_config probe repo.organization.tld
Probing /cvmfs/repo.organization.tld... OK

However, since you are debugging a problem, it probably returns an error...

So, let's enable some debugging output by adding the following line to your client's /etc/cvmfs/default.local:

CVMFS_DEBUGLOG=/path/to/cvmfs.log

Warning

Make sure that the cvmfs user has write permission to the location specified with CVMFS_DEBUGLOG. Otherwise you will not only get no log file, but it will also lead to client failures.

Now we unmount the repository and try to probe it again, so that the configuration gets reloaded and the debug log gets created:

sudo cvmfs_config umount
cvmfs_config probe repo.organization.tld

You can now check your debug log file, and look for any error messages near the bottom of the file; they may reveal more details about the issue.

5.2.2 Debugging connection issues

If the problem turns out to be some kind of connection issue, you can trace it down further by manually checking the connections from your client to the proxy and/or Stratum 1 server.

First, let's rule out that it is some kind of firewall issue by verifying that you can actually connect from your client to the appropriate ports on those servers:

telnet <PROXY_IP> 3128
telnet <STRATUM1_IP> 80

If this does work, probably something is wrong with the services running on these machines.

Every CernVM-FS repository has a file named .cvmfspublished, and you can use curl on your client to fetch it manually, both directly from the Stratum 1 and via your proxy:

# Without your own proxy, so directly to the Stratum 1:
curl --head http://<STRATUM1_IP>/cvmfs/repo.organization.tld/.cvmfspublished
# With your caching proxy between the client and Stratum 1:
curl --proxy http://<PROXY_IP>:3128 --head http://url-to-your-stratum1/cvmfs/repo.organization.tld/.cvmfspublished

These commands should return HTTP/1.1 200 OK. If the first command returns something else, you should inspect your CernVM-FS, Apache, and Squid configuration (and log) files on the Stratum 1 server. If the first curl command does work, but the second does not, there is something wrong with your Squid proxy; make sure that it is running, configured, and able to access your Stratum 1 server.

5.2.3 Checking the logs of CernVM-FS services

Besides the client log file that we already explained, there are some other log files that you can inspect on the different servers.

On the Stratum 0, the main log files are the Apache access and error files, which you can find (on CentOS) in /var/log/httpd.

The Stratum 1 has several services, and, hence, several log files that can be of interest: just like on the Stratum 0, there are the Apache log files. Besides those, also Squid has access and cache log files, which can be found in /var/log/squid. The cvmfs_server snapshot commands will log to /var/log/cvmfs/snapshots.log.

Finally, the only relevant service on the proxy server is Squid itself, so /var/log/squid is again the place to find the log files.

5.3 Garbage collection

As mentioned in the section about publishing, the default configuration of a Stratum 0 enables automatic tagging, which automatically assigns a timestamped tag to each published transaction. However, by default, these automatically generated tags will not be removed automatically. As a result, files that you remove in later transactions will still take up space in your repository...

5.3.1 Setting the lifetime of automatically generated tags

Instead of removing tags manually, you can automatically mark these automatically generated tags for removal after a certain period by setting the following variable in the file /etc/cvmfs/repositories.d/repo.organization.tld/server.conf on your Stratum 0:

CVMFS_AUTO_TAG_TIMESPAN="30 days ago"
This should be a string that can be parsed by the date command, and defines the lifetime of the tags.

5.3.2 Cleaning up tags marked for removal

In order to actually clean up unreferenced data, garbage collection has to be enabled for the repository by adding CVMFS_GARBAGE_COLLECTION=true in the server.conf configuration file on Stratum 0.

The garbage collector of the CernVM-FS server can then be run using:

sudo cvmfs_server gc repo.organization.tld

The gc subcommand has several options; a useful way to run it, especially if you want to do this with a cron job, is:

sudo cvmfs_server gc -a -l -f

The -a option will automatically run the garbage collection for all your repositories that have garbage collection enabled and log to /var/log/cvmfs/gc.log; the -l option will make the command print which objects are actually removed; and the -f option will not prompt for confirmation.

Note that you cannot run the garbage collection while a publish operation is ongoing.

5.4 Gateway and Publishers

Only being able to modify your repository on the Stratum 0 server can be a bit limiting, especially when multiple people have to maintain the repository.

A very recent feature of CernVM-FS allows you to set up so-called publisher machines, which are separate systems that are allowed to modify the repository. It also allows for setting up simple ACLs to let a system only access specific subtrees of the repository.

In order to use this feature you also need a gateway machine that has the repository storage mounted. The easiest way to set it up is by having a single system that serves as both the Stratum 0 and the gateway. This is the setup that we will explain here.

Do note that this is a fairly new feature and is not used a lot by production sites yet. Therefore, use it at your own risk!

5.4.1 Gateway

Requirements

The gateway system has the same requirements as a standard Stratum 0 server, except that it also needs an additional port for the gateway service. This port is configurable, but by default port 4929 is used.

Installation

Perform the installation steps for the Stratum 0, which can be found in an earlier section. Additionally, install the cvmfs-gateway package:

sudo yum install -y cvmfs-gateway

Then create the repository just like we did on Stratum 0:

sudo cvmfs_server mkfs -o $USER repo.organization.tld

Configuration

The gateway requires you to set up a configuration file /etc/cvmfs/gateway/repo.json. This is a JSON file containing the name of the repository, the keys that can be used by publishers to get access to the repository, and the (sub)path that these publishers are allowed to publish to.

The cvmfs-gateway package will make an example file for you, which you can edit or overwrite. It should look like this:

{
    "version": 2,
    "repos" : [
        {
            "domain" : "repo.organization.tld",
            "keys" : [
                {
                    "id": "keyid1",
                    "path": "/"
                },
                {
                    "id": "keyid2",
                    "path": "/restricted/to/subdir"
                }
            ]
        }
    ],
    "keys" : [
        {
            "type" : "plain_text",
            "id" : "keyid1",
            "secret" : "SOME_SECRET"
        },
        {
            "type" : "plain_text",
            "id" : "keyid2",
            "secret" : "SOME_OTHER_SECRET"
        }
    ]
}

You can choose the key IDs and secrets yourself; the secret has to be given to the owner of the corresponding publisher machine.

Finally, there is a second configuration file /etc/cvmfs/gateway/user.json. This is where you can, for instance, change the port of the gateway service and the maximum length of an acquired lease. Assuming you do not have to change the port, you can leave it as it is.

Starting the service

To start the gateway service, use:

systemctl start cvmfs-gateway

Warning

If you do run the gateway service on a different machine than the Stratum 0, make sure to not open transactions on your Stratum 0 server anymore, unless you stop the gateway service first. Otherwise, you may corrupt the repository.

5.4.2 Publisher

Requirements

There a no special requirements for a publisher system with respect to resources.

Installation

The publisher needs to have the cvmfs and cvmfs-server packages installed:

sudo yum install -y epel-release  # not needed on CentOS 8
sudo yum install -y https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
sudo yum install -y cvmfs-server cvmfs

Configuration

The publisher machine only needs three files with keys:

  • the repository's public master key: repo.organization.tld.pub;
  • the repository's public key encoded as X509 certificate: repo.organization.tld.crt;
  • the gateway API key stored in a file named repo.organization.tld.gw.

The first two files can be taken from /etc/cvmfs/keys on your Stratum 0 server. The latter can be created manually and should just contain the following:

plain_text <KEY_ID> <SECRET>
Make sure that <KEY_ID> and <SECRET> correspond to the values you have entered in the gateway configuration.

All these files should be placed in some (temporary) directory on the publisher system.

Creating the repository

We can now create the repository available for writing on our publisher machine by running:

export S0_IP='<STRATUM0_IP>'
sudo cvmfs_server mkfs -w http://$S0_IP/cvmfs/repo.organization.tld \
                       -u gw,/srv/cvmfs/repo.organization.tld/data/txn,http://$S0_IP:4929/api/v1 \
                       -k /path/to/keys/dir -o $USER repo.organization.tld
Replace <STRATUM0_IP> with the IP address (or hostname) of your gateway / Stratum 0 server (and change 4929 in case you changed the gateway port), and /path/to/keys/dir by the path where you stored the keys in the previous step.

Start publishing!

You should now be able to make changes to the repository by starting a transaction:

cvmfs_server transaction repo.organization.tld

You can also request a lock on only a subtree of the repository, so that other publishers can still change different parts of the repository:

cvmfs_server transaction repo.organization.tld/some/subdir

When you are done, publish the changes:

cvmfs_server publish repo.organization.tld

5.5 Mounting CernVM-FS repositories as an unprivileged user

The default way of installing and configuring the CernVM-FS client requires you to have root privileges. In case you want to use CernVM-FS repositories on systems where you do not have these, there are still some ways to install the client and mount repositories. We will show two different methods: using a Singularity container, and cvmfsexec.

5.5.1 Singularity

Recent versions of Singularity offer a --fusemount option that allow you to mount CernVM-FS repositories. In order for this to work, you will need to install the cvmfs and cvmfs-fuse3 package inside your container, and add the right configuration files and public keys for the repositories. Furthermore, you need two make two directories on the host system that will store the CernVM-FS cache and sockets; these need to be made available via a bind mount inside the container at /var/lib/cvmfs and /var/run/cvmfs, respectively.

As an example, you can run the EESSI pilot client container (which was built using this Dockerfile) using Singularity by doing:

mkdir -p /tmp/$USER/{var-lib-cvmfs,var-run-cvmfs}
export SINGULARITY_BIND="/tmp/$USER/var-run-cvmfs:/var/run/cvmfs,/tmp/$USER/var-lib-cvmfs:/var/lib/cvmfs"
export EESSI_CONFIG="container:cvmfs2 cvmfs-config.eessi-hpc.org /cvmfs/cvmfs-config.eessi-hpc.org"
export EESSI_PILOT="container:cvmfs2 pilot.eessi-hpc.org /cvmfs/pilot.eessi-hpc.org"
singularity shell --fusemount "$EESSI_CONFIG" --fusemount "$EESSI_PILOT" docker://eessi/client-pilot:centos7-$(uname -m)

Note that you have to be careful when launching multiple containers on the same machine: in this case, they all need a separate location for the cache, as it cannot be shared across containers.

5.5.2 cvmfsexec

As an alternative, especially when Singularity is not available on your host system, you can try cvmfsexec. Depending on the availability of fusermount and user namespaces on the host system, it has several mechanisms for mounting CernVM-FS repositories, either in a user's own file space or even under /cvmfs.

An advantage of this method is that the cache can be shared by several processes running on the same machines, even if you bind the mountpoint into multiple container instances.

Note

This currently only works on RHEL 6/7/8 and its derivatives, and SUSE 15 and its derivatives.

Besides the cvmfsexec script itself, there is also a singcvmfs script that can be used to easily launch Singularity containers with a CernVM-FS mount; this also uses the aforementiond --fusemount flag. More information about this script can be found on the README page of the cvmfsexec GitHub repository.

5.6 Using a configuration repository

In the first hands-on part of this tutorial we have manually configured our CernVM-FS client.

Although that was not very complicated, we did have to make sure that different things were in the right place and properly named in order to successfully mount the repository. We had to copy the public key of the repository under /etc/cvmfs/key/<domain>, and create a configuration file in /etc/cvmfs/config.d/<reponame>.<domain>.conf that specifies the location of the key as well as the IP(s) of (eventually) the Stratum 1 servers that are available for this repository.

Next to the manual aspect, there is also a maintenance issue here: if the list of Stratum 1 servers changes, for example if additional servers are added to the network, we have know/remember to update our configuration file.

CernVM-FS provides an easy way to prevent these issues, by using a so-called configuration repository. This is a standard CernVM-FS repository which is mounted under /cvmfs, and contains an etc/cvmfs subdirectory with the same structure as the regular /etc/cvmfs. It provides the public keys and configuration of different CernVM-FS repositories, and it is updated automatically when changes are made to it. So there is no more need for manually maintaining or updating for the provided software repositories.

One limitation in CernVM-FS is that you can only use one configuration repository at a time. If you want to mount additional software repositories for which the public key and configuration is not included in the configuration repository you are using, you have to statically configure those repositories, and maintain those configurations yourself somehow, either manually or by making sure you update the package that provides the configuration.

cvmfs-contrib

Several CernVM-FS configuration repositories, which collect the public keys and configuration for a couple of major organizations, are available via the cvmfs-contrib GitHub organisation; see the website and cvmfs-contrib/config-repo GitHub repository.

Easy-to-install packages for different CernVM-FS configuration repositories are available via both a yum and apt repository.

EESSI

The EESSI project also provides easy-to-install packages for its CernVM-FS configuration repository, which are available through the EESSI/filesystem-layer GitHub repository.

For example, to install the EESSI CernVM-FS configuration repository on CentOS 7 or 8:

sudo yum install -y https://github.com/EESSI/filesystem-layer/releases/download/v0.2.3/cvmfs-config-eessi-0.2.3-1.noarch.rpm

After installing this package, you will have the CernVM-FS configuration repository for EESSI available:

$ ls /cvmfs/cvmfs-config.eessi-hpc.org/etc/cvmfs
contact  default.conf  domain.d  keys

And as a result, you can also access the EESSI pilot software repository at /cvmfs/pilot.eessi-hpc.org!


Last update: January 29, 2021