قالب وردپرس درنا توس
Home / Tips and Tricks / Garbage Collect the GitLab Container Registry to free up storage – CloudSavvy IT

Garbage Collect the GitLab Container Registry to free up storage – CloudSavvy IT



Image with the GitLab logo, a stylized fox head

GitLab’s Container Registry provides a convenient place to store your Docker images. Over time, the Container Registry may eat up your disk space as more layers are added. Here’s how to free up storage space by removing excess material.

The Container Registry allows you to store Docker images alongside the source code of your project. If you keep large images in your registry, you will find that your storage costs quickly exceed your expectations. GitLab keeps each layer indefinitely, even after it becomes obsolete.

Set a cleaning policy

The first step in reclaiming your storage space is to configure a Container Registry Cleanup Policy. Cleanup policy is applied separately to each project. This means that you can adapt the retention period to any codebase.

Visit your project in GitLab and click the “Settings” link in the sidebar. Switch to the “CI / CD” category and expand the “Clean up image tags” section at the bottom of the page.

Enable the “Enabled” button to activate the cleaning policy. Then choose when to run the policy – “every day” is a good default.

In the next section, “Keep these tags”, you can define tags that the cleaning policy does not use. The two options, “keep the most recent” and “keep tags matching”, are independent of each other. You could choose to keep dev and nightly, supplemented with the five most recent tags. The latest tag is always included, in addition to any set tags.

The next section, “Delete These Tags”, defines the whitelist of tags to delete. Tags that do not match the regex pattern are not touched. Adjust the ‘Remove tags older than’ value to set the maximum lifespan of each tag before it is cleaned. When you are done, click on the green “Save” button.

Using the API

Applying cleanup policies through the web interface can quickly become tedious. Use the API instead if you change multiple projects.

curl --request PUT --header 'Content-Type: application/json;charset=UTF-8' --header "PRIVATE-TOKEN: " --data-binary '{"container_expiration_policy_attributes":{"cadence":"1month","enabled":true,"keep_n":1,"older_than":"14d","name_regex":"","name_regex_delete":".*","name_regex_keep":"latest' "https://gitlab.example.com/api/v4/projects/"

You need to generate an API access token by going to your profile page in GitLab. Use the token as in the above command. Adjust the URL to point to your project – the ID can be found on the project page in GitLab.

Running the above command will apply a registry cleaning policy that runs every month and cleans images older than 14 days. The latest tag and the most recent tag (keep_n), are retained; all others are eligible for removal (.*

Effects of the cleaning policy

The cleanup policy removes tags based on the criteria you set. The tags are removed from your container registry. They no longer appear in the Container Registry screen of your project and cannot be requested by Docker clients.

However, tagging an image is not the same as actually removing it. The clean-up policy does not recycle data, so you can still see high storage usage even after pruning unused tags.

This is because the image layers remain on your GitLab server, cached for future use. To permanently delete the data, you must then run the Container Registry Garbage Collection procedure.

Garbage Collection

Running Garbage Collection will delete all image layers that are not associated with a tag. This will result in the removal of images not tagged by your cleaning policy. It can also remove old layers that became redundant when you pushed a new version of a tag.

Garbage Collection must be called manually through the GitLab command line interface. Connect to your GitLab server via SSH and run the following command:

sudo gitlab-ctl registry-garbage-collect

The Garbage Collection process is running. All unused tags in your Container Registry are recycled. Garbage Collection searches for untagged images throughout your GitLab instance.

Assuming you run the cleanup policy first, you should now see a healthy reduction in storage usage. If it’s your first time running garbage collection on a commonly used GitLab installation, you may have reclaimed several gigabytes of space.

Delete untagged manifests and layers

You can free up even more space by instructing garbage collection to also delete untagged image manifests and unreferenced layers. This is a more destructive operation, although it is normal what you expect to see.

sudo gitlab-ctl registry-garbage-collect -m

Adding the -m flag removes any layer that is not directly associated with a tagged image manifest. This results in the loss of cache layers and intermediate build steps.

Docker and the GitLab Container Registry keep all created layers by default, even if they are no longer referenced. This means that you can always find a previously known layer using the unique content addressable identifier, even if it no longer has a tag.

This is why deleting these layers is not enabled by default. You should be aware of the implications before running the command as this can have serious consequences in some workflows. However, if you use the -m switching is often desirable – it will free up a lot more disk space and shouldn’t have any side effects if you only reference images using tag names.

Run garbage collection on a schedule

Cleaning policy is automatically executed at the frequency you have configured. Garbage Collection is not set by default, so an initial run can significantly reduce storage usage.

To run garbage collection on a schedule, you must add the command to the crontabCreate a file /etc/cron.d/registry-garbage-collection with the following contents to run garbage collection every Monday at 2am:

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

0 2 * * 1  root gitlab-ctl registry-garbage-collect

Limitations of Garbage Collection

The time it takes to perform garbage collection depends on the amount of data to be deleted. Garbage Collection requires the Container Registry service to be stopped while it is running. This means that your users cannot get or push images until the process is complete.

You can reduce the impact of downtime by putting the registry in read-only mode, running the command, and then switching back to read / write. The registry can remain active all the time, but users cannot push images. Additionally, switching modes requires GitLab to be “reconfigured” (sudo gitlab-ctl reconfigure), which in itself can cause downtime depending on how your installation is set up.

You must enter the following lines /etc/gitlab/gitlab.rb

registry['storage'] = {
    'maintenance' => {
      'readonly' => {
        'enabled' => true
      }
    }
  }

Run sudo gitlab-ctl reconfigure, then use one of the garbage collection commands. When it’s done, turn off read-only mode by pressing the enabled line in you gitlab.rb back to false, and reconfigure GitLab.


Source link