Apt repository mirroring with docker

From small iceberg
Jump to navigation Jump to search

Why mirroring a distribution apt repository ?

Creating apt repo mirroring with docker can be helpful.

  • It allow you to build faster your containers having local connection
  • Debian repo are sometimes instable, then having local repo can stabilize your CI
  • You can control the lifetime repos, then you are sure that your very olds apps still build over time, even if distribution owner decide to remove a version from his repos

Going into code

A simple apache instance can serve the repo for your containers.

Here is the code for this simple Dockerfile :

FROM httpd:2.4.48

# install required packages
RUN apt-get update && apt-get install -y wget

# copy scripts
RUN echo "**** Copy scripts"
COPY ./refresh.sh /root/refresh.sh
RUN chmod 755 /root/refresh.sh
COPY ./health.sh /root/health.sh
RUN chmod 755 /root/health.sh
COPY ./entrypoint.sh /root/entrypoint.sh
RUN chmod 755 /root/entrypoint.sh

# remove http index
RUN rm -f /usr/local/apache2/htdocs/index.html

# entrypoint : start update script & apache
ENTRYPOINT /root/entrypoint.sh

As you see nothing hard : a simple http server with wget to download officials repos and some scripts to manage the mirroring.

Let see entrypoint.sh :


cd /root

# launch repo refresh async
nohup ./refresh.sh &

# disable health route if sync process fail
nohup ./health.sh $PID &

# serve http

Here again : nothing scary. We launch the refresh.sh script in background, get his pid and launch health.sh script to monitor the refresh script.

See the health script :


while ps -p $1 > /dev/null; do sleep 1; done
rm /usr/local/apache2/htdocs/healthz

Nothing say more that if the refresh script stop, we remove file saying to orchestrator (like kubernetes) that our pod is healthy, let you parameter alerts to manage the problem.

The core of the system : download and refresh repositories

The last thing missing for this stack is the data of repositories.

We must download needed repositories and refresh periodically.

In my case, I have an old application containerized based on Ubuntu 12.04. This repo is now unmaintained and archived in a legacy repo by Canonical. Nothing say me that tomorrow, they will still keep this legacy repository online. The other thing to take account, is that theses repos are now static. Then it is more reliable to download separately this repo, only one time, and in a persistent volume.

Once downloading this persistent part is finished, we can download in parallel other repos. For now, debian and ubunutu repos.

After all, we make a "mv" to bring online as fast as possible.

The only thing remain is to tell orchestrator the pod is ready to serve by creating "healthz" file and wait one month to refresh.

Putting all together :


# sync every month
while [ 1 ]; do
  # tell sync in progress
  echo "active" > /usr/local/apache2/htdocs/syncing

  echo "**** Mirror unmaintained ubutnu releases"
  # only if not downloaded before : unmaintained repos is static - it is recommended to have persistent volume on /root/archive.ubuntu.com
  [ -f /root/old-releases.ubuntu.com/done ] && echo "already downloaded" || \
      cd /root && \
      while true; do wget \
         --continue \
         --mirror \
         --no-verbose \
         --no-parent \
         --exclude-directories=/old-images/ubuntu/.temp \
         --reject "current" \
         --reject "pxelinux.cfg" \
         ftp://old-releases.ubuntu.com/old-images/ubuntu && \
      touch /root/old-releases.ubuntu.com/done && break; done
  # fusion to maintened ubuntu repo in order to serve all dist in one place
  echo "syncing"
  cd /root/old-releases.ubuntu.com/old-images && rsync -rt ubuntu /root/archive.ubuntu.com

  echo "**** Mirror maintained ubutnu releases"
  cd /root && \
      while true; do wget \
         --continue \
         --mirror \
         --no-verbose \
         --no-parent \
         --exclude-directories=/ubuntu/.temp \
         --reject "current" \
         --reject "pxelinux.cfg" \
         ftp://archive.ubuntu.com/ubuntu/ && \
      break; done &

  echo "**** Mirror debian releases"
  cd /root && \
      while true; do wget \
         --continue \
         --mirror \
         --no-verbose \
         --no-parent \
         --exclude-directories=/ubuntu/.temp \
         --reject "current" \
         --reject "pxelinux.cfg" \
         ftp://deb.debian.org/debian/ && \
      break; done &


  echo "**** Move mirrors to http server"
  rm -rf /root/old || echo "Old not present"
  mv /usr/local/apache2/htdocs/ubuntu /root/old
  mv /root/archive.ubuntu.com/ubuntu /usr/local/apache2/htdocs/ubuntu
  rm -rf /root/old || echo "Old not present"
  mv /usr/local/apache2/htdocs/debian /root/old
  mv /root/deb.debian.org/debian /usr/local/apache2/htdocs/debian

  echo "healthy"
  echo "ok" > /usr/local/apache2/htdocs/healthz

  # tell sync in progress
  echo "inactive" > /usr/local/apache2/htdocs/syncing

  # wait one month
  sleep 2592000

Let's test the repo

Just build this small Dockerfile to test (replace ip and port by your's) :

FROM ubuntu:20.04

RUN echo "deb focal main restricted universe multiverse" > /etc/apt/sources.list
RUN echo "deb focal-updates main restricted universe multiverse" >> /etc/apt/sources.list
RUN echo "deb focal-security main restricted universe multiverse" >> /etc/apt/sources.list
RUN echo "deb focal-backports main restricted universe multiverse" >> /etc/apt/sources.list
RUN echo "deb focal partner" >> /etc/apt/sources.list

RUN apt-get update && apt-get install -y vim && sleep 20

Get it on github

You can find all sources at https://github.com/sebk69/small-repo-mirror