USING PODMAN instead DOCKER in Azure Runners

using podman in Azure Runners for ADO, pipeline pitfalls

USING PODMAN instead DOCKER in Azure Runners

Why to use PODMAN

If I skip that Podman is more secure then Docker because it does not require a separate daemon to run containers with root privileges. It also offers huge ecosystem (https://github.com/containers) and also offers some features like Image Trust(https://docs.podman.io/en/latest/markdown/podman-image-trust.1.html) and Image Content source policy (https://www.redhat.com/sysadmin/manage-container-registries). In fact right now we don’t have any specific solution how to block some registries/repositories and force to use our private ACR for image pull. Podman has an Docker API compatibility layer. Docker API is compatible with Podman v4 and higher.

Problem with Podman is that it is pretty new, not fully supported by ADO and allow too many customizations.

1. Requirements

I have made a lits of requirements we have for the solution

  • customize ADO agent runner to run podman
  • run podman as docker (some pipelines and ado tasks are using docker command, redirect it to podman with all the arguments), no changes are necessary in source code (Dockerfile)
  • start podman socket/service automaticaly for AzDevOps user and emulate docker socket (for templates)
  • forward Container Registries requests to private ACR
  • whitelist allowed registries/repository/tags, deny all other
  • test build speed of podman container, podman in build speed should be >= docker build
  • docker API compatibility

2. Flow

podman policy flow

podman policy flow

3. Solutions with ADO runner perspective

3.1. customize ADO agent runner to run podman

  • Since we want to stay with ubuntu, the usable version is 22.04, this version of Ubuntu supports only podman 3.4 from official repository. Podman 3.4 is pretty old and full of bugs. Move it to version 4.5.1 with kubic repositoty.
#kubic repository
#install podman,buildah,skopeo
install_packages=(podman buildah skopeo)
REPO_URL="https://download.opensuse.org/repositories/devel:kubic:libcontainers:unstable/xUbuntu_$(lsb_release -rs)"
mkdir -p /etc/apt/keyrings
curl -fsSL $REPO_URL/Release.key \
| gpg --dearmor \| tee /etc/apt/keyrings/devel_kubic_libcontainers_unstable.gpg > /dev/null
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/devel_kubic_libcontainers_unstable.gpg]\
    $REPO_URL/ /" \
| tee /etc/apt/sources.list.d/devel:kubic:libcontainers:unstable.list > /dev/null
apt-get update -qq
apt-get -qq -y install ${install_packages[@]}
mkdir -p /etc/container
  • ADO agent is using user AzDevOps as a target, we will provision this user in advance and linger it to be able to run systemd units automatically, agent is using (https://vstsagenttools.blob.core.windows.net/tools/ElasticPools/Linux/14/enableagent.sh)
#!/usr/bin/env bash
# set -euo pipefail
#uid=1001(AzDevOps) gid=1001(AzDevOps) groups=1001(AzDevOps),4(adm),27(sudo)
useradd -m -u 1001 AzDevOps
# usermod -a -G docker AzDevOps
usermod -a -G adm AzDevOps
usermod -a -G sudo AzDevOps
chmod -R +r /home
setfacl -Rdm "u:AzDevOps:rwX" /home
setfacl -Rb /home/AzDevOps
echo 'AzDevOps ALL=NOPASSWD: ALL' >> /etc/sudoers
loginctl enable-linger AzDevOps
echo "linger enabled"
ls -la /var/lib/systemd/linger
#set XDG dir for AzDevOps user
cat <<EOF > /etc/profile.d/agent_env_vars.sh"
XDG_RUNTIME_DIR="/run/user/1001"
EOF

3.2. run podman as docker (some pipelines and ado tasks are using docker command, redirect it to podman with all the arguments), no changes are necessary in source code (Dockerfile)

#!/bin/bash -e

cat <<EOF > /usr/bin/docker && chmod +x /usr/bin/docker
#!/bin/sh
[ -f /etc/containers/nodocker ] || \
echo "Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg." >&2
exec /usr/bin/podman "\$@"
EOF
touch /etc/containers/nodocker

3.3. Start podman socket/service automaticaly for AzDevOps user and emulate docker socket (for templates)

Podman API is 99.9% compatible with Docker API. So Docker API can be simulated by creating symbolic link

cat /usr/lib/systemd/user/podman.socket
[Unit]
Description=Podman API Socket
Documentation=man:podman-system-service(1)

[Socket]
ListenStream=%t/podman/podman.sock
SocketMode=0660

[Install]
WantedBy=sockets.target
#enable user units for podman and start them
systemctl enable --user podman.socket
systemctl enable --user podman.service
systemctl start --user podman.socket
systemctl start --user podman.service
#test socket
curl --unix-socket /run/user/1001/podman/podman.sock -v 'http://d/v4.5.1/libpod/images/json'
#create symblolic link in well known destination
cat <<EOF >/usr/lib/tmpfiles.d/podman-docker.conf
L+  /run/docker.sock   -    -    -     -   /run/user/1001/podman/podman.sock
EOF
#test socket
curl --unix-socket /run/docker.sock -v 'http://d/v4.5.1/libpod/images/json'

#call docker api from inside container 
podman run -v /var/run/docker.sock:/var/run/docker.sock  quay.io/podman/stable curl -s --unix-socket /var/run/docker.sock http://d/4.5.1/libpod/info
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp registry.aquasec.com/scanner:2302.13.13 scan --local testimage:0.1.212 --no-verify 
#azureRunner profile, setup ENV variables
cat <<EOF > /etc/profile.d/agent_env_vars.sh
export XDG_RUNTIME_DIR="/run/user/1001"
export DOCKER_HOST="unix:///run/user/1001/podman/podman.sock"
export DOCKER_SOCK="/run/user/1001/podman/podman.sock"
EOF

3.4. forward Container Registries requests to private ACR

All request pointing to the selected registers will be redirected to private ACR with some naming convention.

cat <<EOF > /etc/containers/registries.conf
unqualified-search-registries = ['docker.io', 'quay.io', 'gcr.io' ]

[[registry]]
prefix="quay.io"
location="devopsbasecr.azurecr.io/quay"

[[registry]]
prefix="docker.io"
location="devopsbasecr.azurecr.io/docker"

[[registry]]
prefix="gcr.io"
location="devopsbasecr.azurecr.io/gcr"

[[registry]]
prefix="mcr.microsoft.com"
location="devopsbasecr.azurecr.io/ms"
EOF

3.5. whitelist allowed registries/repository/tags, deny all other

Block all registries except the ones that will be redirected, private one and some special registries.

cat <<EOF > /etc/containers/policy.json 
{
    "default": [
        {
            "type": "reject"
        }
    ],
    "transports": {
        "containers-storage": {
            "": [
                {
                    "type": "insecureAcceptAnything"
                }
            ]
        },
        "docker": {
            "registry.aquasec.com": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "devopscr.azurecr.io": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "prdopscr.azurecr.io": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "devappcr.azurecr.io": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "prdappcr.azurecr.io": [
                {
                    "type": "insecureAcceptAnything"
                            ],
            "devopsbasecr.azurecr.io": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "prdopsbasecr.azurecr.io": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "docker.io": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "gcr.io": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "quay.io": [
                {
                    "type": "insecureAcceptAnything"
                }
            ],
            "mcr.microsoft.com": [
                {
                    "type": "insecureAcceptAnything"
                }
            ]
        },
        "docker-daemon": {
            "": [
                {
                    "type": "insecureAcceptAnything"
                }
            ]
        }
    }
}

3.6. test build speed of podman container, podman in build speed should be >= docker build

Concerns about speed compared to Docker were not confirmed and Podman is just as fast, in some cases faster than Docker. Important part is that podman need to run with overlayFS

  graphDriverName: overlay

3.7 Docker API compatibility with PODMAN (mitigation of DockerInstaller@0 task)

We can still use dockerCLI with podman server because of compatibility in docker API. So the drift towards podman should be seamless as possible.

#simulate docker API call
cat <<EOF >createcontainer
{
"Name": "abrakadabra",
"Image": "tst:1"
}
EOF
curl --unix-socket /run/user/1001/podman/podman.sock -H content-type:application/json  -X POST "http://localhost/v1.43/containers/create" -d @createcontainer

#the same with libpod
#simulate libpod APIcall (podman native)
curl -XPOST --unix-socket /run/user/${UID}/podman/podman.sock \
    -H content-type:application/json \
    http://d/v4.5.1/libpod/containers/create -d @createcontainer

In HEQ pipelines task DockerInstaller@0 is widely used

- task: DockerInstaller@0
  inputs:
    dockerVersion: '19.03.4'

with dockerVersion as variable, this task will install dockerCLI and prepand its binary PATH to $PATH. The next task calling “docker” will use modified $PATH so it will use installed docker:

#DockerInstaller@0 flow
Caching tool: docker-stable 19.3.4 x64
Prepending PATH environment variable with directory: /opt/hostedtoolcache/docker-stable/19.3.4/x64
Verifying docker installation...
/opt/hostedtoolcache/docker-stable/19.3.4/x64/docker --version

PATH=/opt/hostedtoolcache/docker-stable/19.3.4/x64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

Setup now looks like:

Client: Docker Engine - Community
 Version:           19.03.4
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        9013bf583a
 Built:             Fri Oct 18 15:49:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: linux/amd64/ubuntu-22.04
 Podman Engine:
  Version:          4.5.1
  APIVersion:       4.5.1
  Arch:             amd64
  BuildTime:        1970-01-01T00:00:00Z
  Experimental:     false
  GitCommit:
  GoVersion:        go1.18.1
  KernelVersion:    5.15.0-1041-azure
  MinAPIVersion:    4.0.0
  Os:               linux
 Conmon:
  Version:          conmon version 2.1.7, commit:
  Package:          conmon_2:2.1.7-0ubuntu22.04+obs15.52_amd64
 OCI Runtime (crun):
  Version:          crun version 1.8.4
commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
rundir: /run/user/1001/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  Package:          crun_101:1.8.4-0ubuntu22.04+obs55.14_amd64
 Engine:
  Version:          4.5.1
  API version:      1.41 (minimum version 1.24)
  Go version:       go1.18.1
  Git commit:
  Built:            Thu Jan  1 00:00:00 1970
  OS/Arch:          linux/amd64
  Experimental:     false

In this particular example docker build will fail with

level=error msg="failed to dial gRPC: unable to upgrade to h2c, received 404"

Problem is in dockerCLI version, in API incompatibility, docker API should be in version >=1.41 it means version of dockerCLI need to be >20

#tested docker CLI versions
#all DockerInstaller@0 task need to be upgraded to higher dockerVersion
dockerVersion: '20.10.0'
dockerVersion: '24.0.0'

Effectively docker Client will use podman Engine as backend and all the requirements defined earlier will be kept.
I have tried also other solutions to migitage posibility of install “unsupported” docker CLI but from ADO perspective it is unreal.

Without usage of DockerInstaller@0 task , setup will stay only with podman

Client:       Podman Engine
Version:      4.5.1
API Version:  4.5.1
Go Version:   go1.18.1
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64

Recomandation is to remove DockerInstaller@0 task from all the pipelines or change dockerCLI in this tasks to higher(tested) version.

4. Azure DevOPS pools

As Azure DevOps runners we are using pools from “Ubuntu 18.04”. We are also using Azure Dependancy Agent for monitoring. Runner was upgraded to Ubuntu 22.04 with Azure Monitor(https://learn.microsoft.com/en-us/azure/virtual-machines/monitor-vm) extension.

Podman configuration

#some tweaks for limits
cat <<EOF > /etc/containers/containers.conf
 [containers]
 default_ulimits = [
   "nofile=10000:20000",
 ]
#podman info
host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2:2.1.7-0ubuntu22.04+obs15.50_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 98.16
    systemPercent: 0.61
    userPercent: 1.23
  cpus: 4
  databaseBackend: boltdb
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: journald
  hostname: C1-PLT-DEV-OPS-rst-vmss00000P
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.15.0-1041-azure
  linkmode: dynamic
  logDriver: journald
  memFree: 13892984832
  memTotal: 16767574016
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun_101:1.8.4-0ubuntu22.04+obs55.14_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-0ubuntu22.04+obs10.87_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 4294963200
  swapTotal: 4294963200
  uptime: 0h 13m 6.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: $HOME/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/AzDevOps/.local/share/containers/storage
  graphRootAllocated: 103865303040
  graphRootUsed: 25445507072
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 9
  runRoot: /tmp/containers-user-1001/containers
  transientStore: false
  volumePath: /home/AzDevOps/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.1
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.18.1
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.1

PROBLEMS

1. MSBUILD (SOLVED - disable /proc mount)

We are hitting problems with MSBuild for dotnet, some of our containers (AzureFunctions) need rebuild and they are using MSBuild where no other runtime then docker is supported. Now I don’t have a solution for such a case when using podman. https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild?view=vs-2022

#code that will fail
dotnet publish -v q /p:BuildNumber=$BUILD_NUMBER /p:CommitHash=$HOST_COMMIT src/WebJobs.Script.WebHost/WebJobs.Script.WebHost.csproj -c Release --output /azure-functions-host --runtime linux-x64 && \

Unhandled exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.DotNet.Cli.Utils.Muxer..ctor()
   at Microsoft.DotNet.Cli.Utils.MSBuildForwardingAppWithoutLogging..ctor(IEnumerable`1 argsToForward, String msbuildPath)
   at Microsoft.DotNet.Tools.MSBuild.MSBuildForwardingApp..ctor(IEnumerable`1 argsToForward, String msbuildPath)
   at Microsoft.DotNet.Tools.RestoringCommand..ctor(IEnumerable`1 msbuildArgs, Boolean noRestore, String msbuildPath, String userProfileDir, Boolean advertiseWorkloadUpdates)
   at Microsoft.DotNet.Tools.Publish.PublishCommand.FromParseResult(ParseResult parseResult, String msbuildPath)
   at Microsoft.DotNet.Tools.Publish.PublishCommand.Run(ParseResult parseResult)
   at Microsoft.DotNet.Cli.ParseResultCommandHandler.Invoke(InvocationContext context)
   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.CommandLineBuilderExtensions.<>c__DisplayClass12_0.<<UseHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__18_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseParseDirective>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.CommandLineBuilderExtensions.<>c__DisplayClass8_0.<<UseExceptionHandler>b__0>d.MoveNext()

2. LOCAL transport when using policy (SOLVED)

        "containers-storage": {
            "": [
                {
                    "type": "insecureAcceptAnything"
                }
            ]
        },

I made an issue where whole problem is described https://github.com/containers/podman/issues/19128