Skip to content

Troubleshooting

In this section, we provides solutions for resolving issues you might encounter with ZaneOps.

ZaneOps typically takes less than 5 minutes to start. If it’s stuck longer than that, it may be due to several possible issues.

Terminal window
# make deploy
====== Deploying ZaneOps with HTTPS 🔒 ======
# other logs...
🏁 Deploy initiated
Waiting for all services to reach desired state, this should take less than 5 minutes...

To check that ZaneOps started correctly:

  1. You need to verify that all ZaneOps docker services started correctly:

    Terminal window
    docker service ls --filter label="zane.stack=true"

    You should have an output like this in the terminal where 1/1 means OK except for zane_temporal-admin-tools which should be 0/1 (1/1 completed) :

    Terminal window
    ID NAME MODE REPLICAS IMAGE PORTS
    inpyph774s3l zane_app replicated 1/1 ghcr.io/zane-ops/app:canary
    otewzijsmzaj zane_db replicated 1/1 postgres:16-alpine
    adv0qcj7fqga zane_fluentd replicated 1/1 fluentd:v1.16.2-1.1
    fa7t7r508bd9 zane_loki replicated 1/1 grafana/loki:3.4
    kisb92em589i zane_pgbouncer replicated 1/1 edoburu/pgbouncer:v1.23.1-p3
    t432gf67whmx zane_proxy replicated 1/1 ghcr.io/zane-ops/proxy:canary
    ux5vuucsm9cv zane_temporal-admin-tools replicated job 0/1 (1/1 completed) temporalio/admin-tools:1.24.2-tctl-1.18.1-cli-1.0.0
    flwwyuihygay zane_temporal-main-worker replicated 1/1 ghcr.io/zane-ops/app:canary
    wldqkpahhuyr zane_temporal-schedule-worker replicated 1/1 ghcr.io/zane-ops/app:canary
    m8wk18qer4ys zane_temporal-server replicated 1/1 temporalio/auto-setup:1.24.2
    limmbrl9o0ub zane_valkey replicated 1/1 valkey/valkey:7.2.5-alpine
  2. If one of the services is not starting correctly, you can check the tasks for the service with:

    Terminal window
    docker service ps zane_{service} # ex: `zane_app`

    You should get an output similar to this where the latest task (the first task from the top) state is Running:

    Terminal window
    ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
    7b21smevkj6l zane_app.1 ghcr.io/zane-ops/app:canary ubs01 Running Running 2 hours ago
    ppfheeofce3t \_ zane_app.1 ghcr.io/zane-ops/app:canary ubs01 Shutdown Shutdown 2 hours ago
  3. If the latest task state show Failed, you can check the logs of the failed status:

    Terminal window
    docker inspect --format '{{ json .Status }}' <TASK-ID> | jq # ex: `7b21smevkj6l` the ID above

    You will get a JSON object like this:

    {
    "Timestamp": "2025-07-16T19:31:21.277975012Z",
    "State": "failed",
    "Message": "started",
    "Err": "No such container: zane_app.1.ypqk8horkrofmsij7r6xtn7kf",
    // ... other fields
    }

    The Err field might have a full explanation of the error.

    Examples:

    • “no suitable node (host-mode port already in use on 1 node)”: meaning the port is already used by another service
    • “invalid pool request: Pool overlaps with other one in this address space”: meaning the address pool (subnet) is already used by another network
  4. You can also see the application logs of the services using:

    Terminal window
    docker service logs zane_{service}
  5. Still stuck? create an issue

Something is running on either port 80 or 443

Section titled “Something is running on either port 80 or 443”

ZaneOps’ proxy (Caddy) uses ports 80 & 443 to expose applications. If another app is using these ports, ZaneOps won’t start.

If you must use another proxy alongside ZaneOps, you can run it as a service inside ZaneOps and proxy requests internally using the service’s network alias:

running a proxy inside Zaneops
/etc/nginx/nginx.conf
server {
listen 80;
server_name xyz.com;
location / {
proxy_pass http://service-xyz.zaneops.internal:80;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}

ZaneOps services share a common docker network: zane
If another network already uses its subnet, ZaneOps won’t start.

First list all docker networks and their subnets:

Terminal window
docker network ls -q | xargs docker network inspect -f '{{.Name}}: {{range .IPAM.Config}}{{.Subnet}}{{end}}'

You should see zane network and another one sharing the same subnet:

Terminal window
bridge: 10.0.0.0/24
host:
none:
my_network: 10.0.1.0/24
zane: 10.0.1.0/24

Make sure to stop all ZaneOps remaining processes:

Terminal window
# assuming you are in /var/www/zaneops
make stop

You can proceed to delete zane network:

Terminal window
docker network rm zane

Then recreate the network with a subnet not in use, for example: 172.28.0.0/16

Terminal window
docker network create --attachable --driver overlay --label zane.stack=true --subnet 172.28.0.0/16 zane

Restart ZaneOps

Terminal window
# assuming you are in /var/www/zaneops
make deploy

You are connected to GitHub Container Registry (ghcr.io)

Section titled “You are connected to GitHub Container Registry (ghcr.io)”

ZaneOps images are hosted on GitHub Container Registry (public). Auth conflicts may happen if your token is not longer valid.

To check if you are connected:

Terminal window
cat ~/.docker/config.json

If ghcr.io appear in the auths key, you are authenticated to GitHub container:

{
"auths": {
"https://ghcr.io": {}
},
// other keys ...
}

Note that you can also deploy any application hosted on a private container registry by specifying the credentials in the source in the service settings page:

Docker credentials

You might get this error if you have a VPN installed:

Terminal window
$ docker inspect --format '{{ json .Status }}' <task-id> | jq
{
"Timestamp": "2025-07-16T19:31:21.277975012Z",
"State": "failed",
"Message": "started",
"Err": "network sandbox join failed: subnet sandbox join failed for \"10.0.1.0/24\": error creating vxlan interface: operation not supported",
# ... other fields
}

This error means Docker couldn’t create a VXLAN interface—usually due to missing kernel modules or VPN conflicts.

Service metrics aren’t shown in the dashboard

Section titled “Service metrics aren’t shown in the dashboard”

If your app deployed successfully but no metrics appear (usually after ~30s), it may be because cgroups are not enabled on your server.

Empty metrics