Skip to content

Koji Garbage Collection at CERN


By garbage collection, we mean removing builds from the system that are no longer needed. While it would be nice to keep everything forever, we have a finite amount of space and the RPMs pile up quickly. While the current implementation focuses on removing builds, we may in the future perform other sorts of cleaning.

The intent of garbage collection is to be as careful as possible and only remove builds that truly are no longer needed. The process happens in stages and a notification email is sent to the build owner when a build is marked for deletion.

Stages


The three stages of garbage collection are:

pruning
obsoleted builds are untagged according to policies
trashing
untagged/unreferenced builds are placed in the trashcan
deleting
builds in the trashcan are deleted

You might envision a build's "life-cycle" as follows:

tagged -> untagged -> in_trashcan -> deleted

Pruning

In the pruning step, unneeded builds are automatically removed from certain tags according to a set of policies. These policies are robust and allow rules based on tag, package, age (within tag), order (within tag), and signatures. More details on this are presented below.

Because of the policies, the pruning step can be made to behave in different ways. However, the overall intent is to remove old builds from tags. Generally policies will be along the lines of "keep the latest three builds of each package in such-and-such tag."

Please note that pruning does not delete builds, it simply untags them (though this may eventually lead to their deletion via the other stages). Since a build can be multiply tagged, it may be untagged from one tag but remain in another.

Trashing

Trashing a build simply means tagging it with the special 'trashcan' tag. We say that such a build is 'marked for deletion' or 'has been placed in the trashcan.' The point of this step is to provide a safety net and give the build a chance to be salvaged if need be. The garbage collector will only place builds in the trashcan if it satisfies these basic requirements:

  • the build is untagged, and has been untagged for at least 5 days (actual delay configurable)
  • the build is not signed with a protected key (all keys are protected by default)
  • the build has not been used in the buildroot of another completed build
  • the build has not been used in any buildroot for 5 days (same configurable delay as above)

If a build satisfies these conditions, then it will be placed in the special trashcan tag and an email notification is sent to the build owner.

Deleting

In the deleting step, each build in the trashcan is examined. If it is still eligible for deletion and has been in the trashcan longer than the grace period (4 weeks, configurable), it is deleted. If it is not eligible (e.g. it has been tagged elsewhere or somehow acquired a protected signature), then it is removed from trashcan tag (this process is called salvage).

When a build is actually deleted, the files are removed from disk and some (but not all) of the data about the build is removed from the DB. The residual data is quite small, though. In particular, the build entry and rpm entries are still present. This prevents reuse of the nvr or nvras.

How to protect a build


If a build is marked for deletion, the odds are that this is the correct thing to do. However, it is possible that a build was mistakenly untagged. If you believe this is the case, the fix is to make sure the build is properly tagged. If you have any issues, please open a ticket.

More about pruning policies


The pruning policy is a series of rules. During pruning, the garbage collector goes through each tag in the system and considers its contents. For each build within the tag, it goes through the pruning rules until it finds one that matches. It it does, it takes that action for it.

In the policy configuration, each line is a rule, and the first matching rule wins. The format is:

test  [ && test  && ...]  :: action

The available tests are:

tag
the name of the tag must match one of the patterns
package
the name of the package must match one of the patterns
age
a comparison against the length of time since the build was tagged. This is not the same as the age of the build.
sig
true if any of the build's component RPMs are signed with a matching key
order
a comparison against the order number of the build within a given tag. The order number is the number of more recently tagged builds for the same package within the tag. For example, the latest build of glibc in dist-f8 has order number 0, the next latest has order number 1, and so on. Note that the 'skip' action modifies this -- the build is kept, but is not counted for ordering.

Note that the tests are not being applied to just a build, but to a build within a tag. If a build is multiply tagged, it will be checked against these policies for each tag and may be kept in some but untagged in others.

The available actions are:

keep
do not untag the build from this tag.
skip
like keep, but do not count the build for ordering
untag
untag the build from this tag

Note that, regardless of any policies, locked tags are left alone.

At present, the pruning policy is:

  tag *-testing *-qa && order >= 2: untag
  tag *-testing *-qa && order >= 1 && age > 12 weeks: untag
  tag *-stable *-stable1 *-stable2 *-stable3 && order >= 3: untag

  # default: do nothing
  tag * :: keep

This could be summarized as follows:

  • testing and qa tags will always have at least the last tagged build.
  • testing and qa tags may have two more builds if they have a max age of 12 weeks.
  • stable tag will keep the last 4 builds, regardless of their age.

Irrespective of the garbage collection policy, the repositories generated from testing tags will contain only the last tagged build. The other builds are part of the tag (until they're removed by this garbage collection policy), but they will not be included in the generated repos.