n8n Monitoring with Prometheus & Grafana on Docker Compose (Traefik, Exporters) — with One-Script Automation

n8n shines when your business relies on dozens of API calls, webhooks, and scheduled jobs ticking away 24/7. But the more workflows you run, the easier it is for silent failures to hide—auth keys expire, APIs rate-limit, workers get saturated, queues back up. Observability turns that guesswork into a clear picture: you’ll spot regressions earlier, fix incidents faster, and scale with confidence (especially when you move beyond a single container to workers and Redis).

Why you need monitoring for n8n

  • Silent failures happen. Tokens expire, APIs rate-limit, workers stall, and queues back up—often with no obvious error in the editor.
  • Incidents need context. When “n8n is slow,” you must know if it’s workflow errors, Redis pressure, Postgres saturation, or host CPU.
  • Scaling without data is risky. Queue-mode only shines when you can see wait time, p95 execution, and worker load to right-size capacity.
  • Upgrades & changes add risk. New versions, new workflows, and traffic spikes demand early warning, not post-mortems.

By the end of this guide you’ll have a production-ready monitoring stack for n8n on an Ubuntu VPS using Docker Compose and Traefik—with clear dashboards, practical alerts, and an optional one-script automation path.

Architecture Overview

A complete, production-grade monitoring stack needs four capabilities: collection (exporters + app metrics), storage (time-series DB), alerting, and visualization. Below is what we use for n8n on Ubuntu + Docker Compose, and why each piece matters.

ComponentWhat it doesWhy we need it
Traefik (edge)Reverse proxy, TLS (Let’s Encrypt), routing to n8n., grafana.; optional Basic Auth for dashboards.Secure public entrypoint on 80/443; keeps Grafana protected and simplifies domain management.
Redis (BullMQ)Queue backend for n8n (only in queue mode) for waiting/active/failed jobs, latency.Queue depth and evictions are leading indicators of incidents; we monitor it to prevent backlog and timeouts.
PostgresPrimary database for n8n (workflows, credentials, execution data).Saturation, locks, or connection spikes directly degrade reliability; DB health must be first-class.
PrometheusPulls /metrics from n8n, Traefik, and exporters; stores time-series.Single source of truth for operational signals; enables alerting based on real data.
Grafana (Dashboards & Alerting)Visualizes key panels (success rate, p95 duration, queue depth, infra); manages alert rules and contact points.Human-friendly operations view and notifications to Email/Slack/Telegram without exposing Prometheus.
Node ExporterHost-level CPU/RAM/disk/filesystem metrics.Detects VPS pressure and capacity limits before containers fail; informs right-sizing.
cAdvisorPer-container CPU/memory/restarts metrics.Finds “noisy neighbors,” memory leaks, and crash loops quickly.
Postgres ExporterDB-specific metrics (connections, locks, cache hit, saturation).Early warning on DB bottlenecks and misconfiguration.
Redis ExporterRedis metrics (memory, evictions, ops/sec, latency).Protects queue-mode throughput; alerts before job wait times spike.

n8n Queue Mode Observability Architecture

Users reach the stack over HTTPS via Traefik, which terminates TLS and routes app traffic to n8n Main. n8n persists state in PostgreSQL and enqueues jobs in Redis; Workers and the Task Runner consume from Redis, execute workflows, and write results back to PostgreSQL.
Prometheus scrapes metrics from Traefik, n8n, and the exporters, while Grafana visualizes them on dashboards protected by Traefik Basic Auth.

n8n-queue-mode-architecture-with-monitoring
n8n Queue Mode Observability Architecture

n8n Single Mode Observability Architecture

n8n Single Mode Observability Architecture

Step-by-Step: Install n8n with Prometheus & Grafana Monitoring

1. Prerequisites

Ubuntu VPS: 22.04+ recommended.

Sizing (minimums):

  • Single mode: 1 vCPU, 2 GB RAM, 20 GB disk.
  • Queue mode (main + workers): 2–4 vCPU, 4–8 GB RAM, 40 GB+ disk (scale workers with load).

Domain & DNS

Create DNS records pointing to your VPS public IP:

  • n8n.<your-domain> → A/AAAA to server IP
  • grafana.<your-domain> → A/AAAA to server IP
    (Optional) prometheus.<your-domain> only if you choose to expose it (default is private).

TLS & email

  • Let’s Encrypt email for Traefik (renewal notices).
  • Decide notification channels for alerts: Email/SMTP, Slack, Telegram (you’ll plug these into Grafana).

2. Clone my n8n-toolkit repo

# Option 1 — Developers (Git)
git clone https://github.com/thenguyenvn90/n8n-toolkit.git
cd n8n-toolkit

# Option 2 — Download as ZIP
sudo apt update && sudo apt install -y unzip
curl -L -o n8n-toolkit.zip https://github.com/thenguyenvn90/n8n-toolkit/archive/refs/heads/main.zip
unzip n8n-toolkit.zip
cd n8n-toolkit-main

root@ubuntu-s-1vcpu-1gb-sgp1-01:~/n8n-toolkit# tree
.
├── LICENSE
├── README.md
├── common.sh
├── monitoring
│   ├── grafana
│   │   └── provisioning
│   │       ├── alerts
│   │       │   └── n8n_grafana_alerts.json
│   │       ├── dashboards
│   │       │   ├── Cadvisor-exporter-14282.json
│   │       │   ├── Node-Exporter-Full-1860.json
│   │       │   ├── PostgreSQL-Database-9628.json
│   │       │   ├── Redis-Dashboard-Prometheus-Redis-Exporter-1.x-763.json
│   │       │   ├── Traefik-2.2-12250.json
│   │       │   ├── dashboards.yml
│   │       │   ├── n8n-Essentials.json
│   │       │   ├── n8n-Queue-Mode-Health-Essentials.json
│   │       │   ├── n8n-Queue-Mode-Health-Full.json
│   │       │   ├── n8n–Queue-Mode-Health-Essentials.json
│   │       │   └── n8n–Queue-Mode-Health-Full.json
│   │       └── datasources
│   │           └── datasource.yml
│   └── prometheus.yml
├── n8n_manager.sh
├── queue-mode
│   └── docker-compose.yml
└── single-mode
    └── docker-compose.yml

3. Install Docker and Docker Compose

sudo apt update && sudo apt upgrade -y

sudo apt-get install -y ca-certificates curl gnupg lsb-release

# Add Docker’s official GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# Add the Docker repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine and Compose v2
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Allow user to run Docker without sudo
sudo usermod -aG docker ${USER}
# Register the `docker` group membership with current session without changing your primary group
exec sg docker newgrp

4. Create project and copy templates

# Create n8n target directory
sudo mkdir -p /home/n8n && sudo chown -R $USER: /home/n8n

# Copy our docker compose and .enf files
# For single mode
cp ./single-mode/docker-compose.yml /home/n8n/docker-compose.yml
cp ./single-mode/.env /home/n8n/.env

# For queue mode
cp ./queue-mode/docker-compose.yml /home/n8n/docker-compose.yml
cp ./queue-mode/.env /home/n8n/.env

# Copy the whole folder monitoring/ → /home/n8n/monitoring/
cp -R ./monitoring /home/n8n/
single-mode/docker-compose.yml
services:
  traefik:
    image: traefik:v2.11
    restart: unless-stopped
    command:
      - "--api.dashboard=false"
      # EntryPoints
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--entrypoints.websecure.address=:443"
      # Providers
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.docker.network=n8n-network"
      # ACME (production)
      - "--certificatesresolvers.le.acme.email=${SSL_EMAIL}"
      - "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.le.acme.tlschallenge=true"
      # Logs
      - "--log.level=INFO"
      - "--accesslog=true"
      # Health check
      - "--ping=true"
      - "--ping.entrypoint=traefikping"
      - "--entrypoints.traefikping.address=:8082"
      # Prometheus metrics
      - "--metrics.prometheus=true"
      - "--metrics.prometheus.addEntryPointsLabels=true"
      - "--metrics.prometheus.addRoutersLabels=true"
      - "--metrics.prometheus.addServicesLabels=true"
      - "--entrypoints.metrics.address=:8081"
      - "--metrics.prometheus.entryPoint=metrics"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - letsencrypt:/letsencrypt
      - ./secrets/htpasswd:/etc/traefik/htpasswd:ro
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8082/ping"]
      interval: 10s
      timeout: 5s
      start_period: 20s
      retries: 5

  postgres:
    image: postgres:14
    restart: unless-stopped
    env_file: [.env]
    environment:
      - TZ=${GENERIC_TIMEZONE}
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "${POSTGRES_USER}"]
      interval: 10s
      timeout: 5s
      start_period: 20s
      retries: 5

  # n8n (single mode)
  main:
    image: docker.n8n.io/n8nio/n8n:${N8N_IMAGE_TAG:-latest}
    restart: unless-stopped
    env_file: [.env]
    environment:
      - TZ=${GENERIC_TIMEZONE}
      - N8N_METRICS=true
    volumes:
      - n8n-data:/home/node/.n8n
      - ./local-files:/files
    depends_on:
      postgres:
        condition: service_healthy
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    healthcheck:
      test: ["CMD-SHELL", "wget --spider -q http://localhost:${N8N_PORT:-5678}/healthz || exit 1"]
      interval: 10s
      timeout: 5s
      start_period: 20s
      retries: 5
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=n8n-network"
      # Router & TLS
      - "traefik.http.routers.n8n.rule=Host(`${N8N_FQDN}`)"
      - "traefik.http.routers.n8n.entrypoints=websecure"
      - "traefik.http.routers.n8n.tls=true"
      - "traefik.http.routers.n8n.tls.certresolver=le"
      # Bind the router to the named Traefik service defined below
      - "traefik.http.routers.n8n.service=n8n"
      - "traefik.http.services.n8n.loadbalancer.server.port=${N8N_PORT:-5678}"
      # Middlewares
      - "traefik.http.routers.n8n.middlewares=n8n-headers,n8n-rate,n8n-retry,n8n-compress"
      # Security headers
      - "traefik.http.middlewares.n8n-headers.headers.stsSeconds=315360000"
      - "traefik.http.middlewares.n8n-headers.headers.browserXssFilter=true"
      - "traefik.http.middlewares.n8n-headers.headers.contentTypeNosniff=true"
      - "traefik.http.middlewares.n8n-headers.headers.forceSTSHeader=true"
      - "traefik.http.middlewares.n8n-headers.headers.stsIncludeSubdomains=true"
      - "traefik.http.middlewares.n8n-headers.headers.stsPreload=true"
      # Rate limiting
      - "traefik.http.middlewares.n8n-rate.ratelimit.average=100"
      - "traefik.http.middlewares.n8n-rate.ratelimit.burst=50"
      - "traefik.http.middlewares.n8n-rate.ratelimit.period=1s"
      # Retry & compression
      - "traefik.http.middlewares.n8n-retry.retry.attempts=3"
      - "traefik.http.middlewares.n8n-compress.compress=true"

  # ===== Monitoring (enabled by COMPOSE_PROFILES=monitoring) =====
  prometheus:
    profiles: ["monitoring"]
    image: prom/prometheus:latest
    restart: unless-stopped
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=15d"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    labels:
      - "traefik.enable=${EXPOSE_PROMETHEUS:-false}"
      - "traefik.docker.network=n8n-network"
      - "traefik.http.routers.prom.rule=Host(`${PROMETHEUS_FQDN}`)"
      - "traefik.http.routers.prom.entrypoints=websecure"
      - "traefik.http.routers.prom.tls=true"
      - "traefik.http.routers.prom.tls.certresolver=le"
      - "traefik.http.services.prom.loadbalancer.server.port=9090"
      - "traefik.http.routers.prom.middlewares=prom-auth@docker"
      - "traefik.http.middlewares.prom-auth.basicauth.usersfile=${TRAEFIK_USERSFILE}"

  grafana:
    profiles: ["monitoring"]
    image: grafana/grafana:latest
    restart: unless-stopped
    environment:
      - TZ=${GENERIC_TIMEZONE}
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${MONITORING_BASIC_AUTH_PASS}
      - GF_SERVER_DOMAIN=${GRAFANA_FQDN}
      - GF_SERVER_ROOT_URL=https://${GRAFANA_FQDN}
      - GF_SERVER_ENFORCE_DOMAIN=true
      - GF_SECURITY_COOKIE_SECURE=true
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_PATHS_PROVISIONING=/etc/grafana/provisioning
    volumes:
      - grafana-data:/var/lib/grafana
      - ./monitoring/grafana/provisioning/datasources:/etc/grafana/provisioning/datasources:ro
      - ./monitoring/grafana/provisioning/dashboards:/etc/grafana/provisioning/dashboards:ro
      - ./monitoring/grafana/provisioning/alerts:/etc/grafana/provisioning/alerts:ro
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=n8n-network"
      - "traefik.http.routers.grafana.rule=Host(`${GRAFANA_FQDN}`)"
      - "traefik.http.routers.grafana.entrypoints=websecure"
      - "traefik.http.routers.grafana.tls=true"
      - "traefik.http.routers.grafana.tls.certresolver=le"
      - "traefik.http.services.grafana.loadbalancer.server.port=3000"
      - "traefik.http.routers.grafana.middlewares=grafana-auth@docker,secure-headers@docker"
      - "traefik.http.middlewares.grafana-auth.basicauth.usersfile=${TRAEFIK_USERSFILE}"
      - "traefik.http.middlewares.grafana-auth.basicauth.removeheader=true"
      - "traefik.http.middlewares.secure-headers.headers.stsSeconds=31536000"
      - "traefik.http.middlewares.secure-headers.headers.stsIncludeSubdomains=true"
      - "traefik.http.middlewares.secure-headers.headers.stsPreload=true"
      - "traefik.http.middlewares.secure-headers.headers.browserXssFilter=true"
      - "traefik.http.middlewares.secure-headers.headers.contentTypeNosniff=true"
    depends_on: [prometheus]

  postgres-exporter:
    profiles: ["monitoring"]
    image: quay.io/prometheuscommunity/postgres-exporter:latest
    restart: unless-stopped
    environment:
      - DATA_SOURCE_NAME=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}?sslmode=disable
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    depends_on:
      postgres:
        condition: service_healthy

  cadvisor:
    profiles: ["monitoring"]
    image: gcr.io/cadvisor/cadvisor:latest
    restart: unless-stopped
    privileged: true
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]

  node-exporter:
    profiles: ["monitoring"]
    image: prom/node-exporter:latest
    restart: unless-stopped
    pid: "host"
    networks: [n8n-network]
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - "--path.procfs=/host/proc"
      - "--path.sysfs=/host/sys"
      - "--path.rootfs=/rootfs"
    security_opt: [no-new-privileges:true]

networks:
  n8n-network:
    name: n8n-network
    driver: bridge

volumes:
  n8n-data:
    external: true
  postgres-data:
    external: true
  letsencrypt:
    external: true
  prometheus-data:
    external: true
  grafana-data:
    external: true
single-mode/.env
# ================================================
#    ENV VARIABLES FOR SINGLE MODE & MONITORING
# ================================================

# -------- DOMAIN & FQDN --------
DOMAIN=example.com
SUBDOMAIN_N8N=n8n
SUBDOMAIN_GRAFANA=grafana
SUBDOMAIN_PROMETHEUS=prometheus
SSL_EMAIL=you@example.com
GENERIC_TIMEZONE=Asia/Ho_Chi_Minh

N8N_FQDN=${SUBDOMAIN_N8N}.${DOMAIN}
GRAFANA_FQDN=${SUBDOMAIN_GRAFANA}.${DOMAIN}
PROMETHEUS_FQDN=${SUBDOMAIN_PROMETHEUS}.${DOMAIN}

# -------- IMAGE & RUNTIME --------
N8N_IMAGE_TAG=latest
NODE_ENV=production
N8N_LOG_LEVEL=info
N8N_DIAGNOSTICS_ENABLED=false
N8N_BLOCK_ENV_ACCESS_IN_NODE=true

# -------- n8n URLS --------
N8N_PORT=5678
N8N_PROTOCOL=https
N8N_HOST=${N8N_FQDN}
WEBHOOK_URL=https://${N8N_FQDN}
N8N_EDITOR_BASE_URL=https://${N8N_FQDN}
N8N_PUBLIC_API_BASE_URL=https://${N8N_FQDN}
N8N_SECURE_COOKIE=true

# -------- SECURITY & SECRETS --------
# Generate with: openssl rand -base64 16
POSTGRES_PASSWORD=CHANGE_ME_BASE64_16_BYTES
# Generate with: openssl rand -base64 32
N8N_ENCRYPTION_KEY=CHANGE_ME_BASE64_32_BYTES
# Generate with: openssl rand -base64 16
N8N_BASIC_AUTH_PASSWORD=CHANGE_ME_BASE64_16_BYTES

N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
N8N_BASIC_AUTH_ACTIVE=true
N8N_BASIC_AUTH_USER=admin
# N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}

# -------- DATABASE SETTINGS --------
DB_TYPE=postgresdb
DB_POSTGRESDB_HOST=postgres
DB_POSTGRESDB_PORT=5432
DB_POSTGRESDB_DATABASE=n8n
DB_POSTGRESDB_USER=n8n
DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}

POSTGRES_USER=n8n
# POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
POSTGRES_DB=n8n

# -------- RUNNERS (internal) --------
N8N_RUNNERS_ENABLED=true
N8N_RUNNERS_MODE=internal
N8N_RUNNERS_MAX_CONCURRENCY=5
N8N_RUNNERS_AUTH_TOKEN=${N8N_BASIC_AUTH_PASSWORD}

# -------- EXECUTION BEHAVIOR --------
EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=336
EXECUTIONS_RETRY_MAX=3

# -------- MONITORING STACK --------
COMPOSE_PROJECT_NAME=n8n
COMPOSE_PROFILES=
TRAEFIK_USERSFILE=/etc/traefik/htpasswd
MONITORING_BASIC_AUTH_USER=admin
# Grafana admin password and Traefik’s Basic Auth
MONITORING_BASIC_AUTH_PASS=StrongPass@123
EXPOSE_PROMETHEUS=false

# -------- END OF CONFIG --------
queue-mode/docker-compose.yml
services:
  traefik:
    image: traefik:v2.11
    restart: unless-stopped
    command:
      - "--api.dashboard=false"
      # EntryPoints
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--entrypoints.websecure.address=:443"
      # Providers
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.docker.network=n8n-network"
      # ACME (production)
      - "--certificatesresolvers.le.acme.email=${SSL_EMAIL}"
      - "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.le.acme.tlschallenge=true"
      # Logs
      - "--log.level=INFO"
      - "--accesslog=true"
      # Health check
      - "--ping=true"
      - "--ping.entrypoint=traefikping"
      - "--entrypoints.traefikping.address=:8082"
      # Prometheus metrics
      - "--metrics.prometheus=true"
      - "--metrics.prometheus.addEntryPointsLabels=true"
      - "--metrics.prometheus.addRoutersLabels=true"
      - "--metrics.prometheus.addServicesLabels=true"
      - "--entrypoints.metrics.address=:8081"
      - "--metrics.prometheus.entryPoint=metrics"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - letsencrypt:/letsencrypt
      - ./secrets/htpasswd:/etc/traefik/htpasswd:ro
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8082/ping"]
      interval: 10s
      timeout: 5s
      start_period: 10s
      retries: 5

  postgres:
    image: postgres:14
    restart: unless-stopped
    env_file: [.env]
    environment:
      - TZ=${GENERIC_TIMEZONE}
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "${POSTGRES_USER}"]
      interval: 10s
      timeout: 5s
      start_period: 10s
      retries: 5

  redis:
    image: redis:7
    restart: unless-stopped
    environment:
      - TZ=${GENERIC_TIMEZONE}
    command: ["redis-server", "--requirepass", "${REDIS_PASSWORD}"]
    volumes:
      - redis-data:/data
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 10s
      start_period: 10s
      retries: 5

  # Main (UI, schedules, webhooks)
  main:
    image: docker.n8n.io/n8nio/n8n:${N8N_IMAGE_TAG:-latest}
    restart: unless-stopped
    env_file: [.env]
    environment:
      - TZ=${GENERIC_TIMEZONE}
      - N8N_METRICS=true
      - N8N_METRICS_INCLUDE_QUEUE_METRICS=true
    volumes:
      - n8n-data:/home/node/.n8n
      - ./local-files:/files
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "wget --spider -q http://localhost:${N8N_PORT:-5678}/healthz || exit 1"]
      interval: 10s
      timeout: 5s
      start_period: 20s
      retries: 5
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=n8n-network"
      # Router & TLS
      - "traefik.http.routers.n8n.rule=Host(`${N8N_FQDN}`)"
      - "traefik.http.routers.n8n.entrypoints=websecure"
      - "traefik.http.routers.n8n.tls=true"
      - "traefik.http.routers.n8n.tls.certresolver=le"
      # Traefik 'service' label name
      - "traefik.http.routers.n8n.service=main"
      - "traefik.http.services.main.loadbalancer.server.port=${N8N_PORT:-5678}"
      # Middlewares
      - "traefik.http.routers.n8n.middlewares=n8n-headers,n8n-rate,n8n-retry,n8n-compress"
      - "traefik.http.middlewares.n8n-headers.headers.stsSeconds=315360000"
      - "traefik.http.middlewares.n8n-headers.headers.browserXssFilter=true"
      - "traefik.http.middlewares.n8n-headers.headers.contentTypeNosniff=true"
      - "traefik.http.middlewares.n8n-headers.headers.forceSTSHeader=true"
      - "traefik.http.middlewares.n8n-headers.headers.stsIncludeSubdomains=true"
      - "traefik.http.middlewares.n8n-headers.headers.stsPreload=true"
      - "traefik.http.middlewares.n8n-rate.ratelimit.average=100"
      - "traefik.http.middlewares.n8n-rate.ratelimit.burst=50"
      - "traefik.http.middlewares.n8n-rate.ratelimit.period=1s"
      - "traefik.http.middlewares.n8n-retry.retry.attempts=3"
      - "traefik.http.middlewares.n8n-compress.compress=true"
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]

  # External Task Runner for n8n-main
  runner-main:
    image: docker.n8n.io/n8nio/n8n:${N8N_IMAGE_TAG}
    restart: unless-stopped
    env_file: [.env]
    environment:
      - TZ=${GENERIC_TIMEZONE}
    entrypoint: ["/usr/local/bin/task-runner-launcher"]
    command: ["javascript"]
    depends_on:
      main:
        condition: service_started
      redis:
        condition: service_healthy
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]

  # Worker(s) – scale horizontally
  worker:
    image: docker.n8n.io/n8nio/n8n:${N8N_IMAGE_TAG:-latest}
    restart: unless-stopped
    env_file: [.env]
    environment:
      - TZ=${GENERIC_TIMEZONE}
    command: ["worker", "--concurrency=${N8N_WORKER_CONCURRENCY}"]
    volumes:
      - n8n-data:/home/node/.n8n
      - ./local-files:/files
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]

  # ===== Monitoring (profile) =====
  prometheus:
    profiles: ["monitoring"]
    image: prom/prometheus:latest
    restart: unless-stopped
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=15d"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    labels:
      - "traefik.enable=${EXPOSE_PROMETHEUS:-false}"
      - "traefik.docker.network=n8n-network"
      - "traefik.http.routers.prom.rule=Host(`${PROMETHEUS_FQDN}`)"
      - "traefik.http.routers.prom.entrypoints=websecure"
      - "traefik.http.routers.prom.tls=true"
      - "traefik.http.routers.prom.tls.certresolver=le"
      - "traefik.http.services.prom.loadbalancer.server.port=9090"
      - "traefik.http.routers.prom.middlewares=prom-auth@docker"
      - "traefik.http.middlewares.prom-auth.basicauth.usersfile=${TRAEFIK_USERSFILE}"

  grafana:
    profiles: ["monitoring"]
    image: grafana/grafana:latest
    restart: unless-stopped
    environment:
      - TZ=${GENERIC_TIMEZONE}
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${MONITORING_BASIC_AUTH_PASS}
      - GF_SERVER_DOMAIN=${GRAFANA_FQDN}
      - GF_SERVER_ROOT_URL=https://${GRAFANA_FQDN}
      - GF_SERVER_ENFORCE_DOMAIN=true
      - GF_SECURITY_COOKIE_SECURE=true
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_PATHS_PROVISIONING=/etc/grafana/provisioning
    volumes:
      - grafana-data:/var/lib/grafana
      - ./monitoring/grafana/provisioning/datasources:/etc/grafana/provisioning/datasources:ro
      - ./monitoring/grafana/provisioning/dashboards:/etc/grafana/provisioning/dashboards:ro
      - ./monitoring/grafana/provisioning/alerts:/etc/grafana/provisioning/alerts:ro
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=n8n-network"
      - "traefik.http.routers.grafana.rule=Host(`${GRAFANA_FQDN}`)"
      - "traefik.http.routers.grafana.entrypoints=websecure"
      - "traefik.http.routers.grafana.tls=true"
      - "traefik.http.routers.grafana.tls.certresolver=le"
      - "traefik.http.services.grafana.loadbalancer.server.port=3000"
      # PROXY BASIC AUTH (usersFile)
      - "traefik.http.routers.grafana.middlewares=grafana-auth@docker,secure-headers@docker"
      - "traefik.http.middlewares.grafana-auth.basicauth.usersfile=${TRAEFIK_USERSFILE}"
      - "traefik.http.middlewares.grafana-auth.basicauth.removeheader=true"
      # Optional hardening (HSTS, XSS protection, etc.)
      - "traefik.http.middlewares.secure-headers.headers.stsSeconds=31536000"
      - "traefik.http.middlewares.secure-headers.headers.stsIncludeSubdomains=true"
      - "traefik.http.middlewares.secure-headers.headers.stsPreload=true"
      - "traefik.http.middlewares.secure-headers.headers.browserXssFilter=true"
      - "traefik.http.middlewares.secure-headers.headers.contentTypeNosniff=true"
    depends_on: [prometheus]

  postgres-exporter:
    profiles: ["monitoring"]
    image: quay.io/prometheuscommunity/postgres-exporter:latest
    restart: unless-stopped
    environment:
      - DATA_SOURCE_NAME=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}?sslmode=disable
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    depends_on:
      postgres:
        condition: service_healthy

  redis-exporter:
    profiles: ["monitoring"]
    image: oliver006/redis_exporter:latest
    restart: unless-stopped
    command:
      - "--redis.addr=redis:6379"
      - "--redis.password=${REDIS_PASSWORD}"
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]
    depends_on:
      redis:
        condition: service_healthy

  cadvisor:
    profiles: ["monitoring"]
    image: gcr.io/cadvisor/cadvisor:latest
    restart: unless-stopped
    privileged: true
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    networks: [n8n-network]
    security_opt: [no-new-privileges:true]

  node-exporter:
    profiles: ["monitoring"]
    image: prom/node-exporter:latest
    restart: unless-stopped
    pid: "host"
    networks: [n8n-network]
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - "--path.procfs=/host/proc"
      - "--path.sysfs=/host/sys"
      - "--path.rootfs=/rootfs"
    security_opt: [no-new-privileges:true]

networks:
  n8n-network:
    name: n8n-network
    driver: bridge

volumes:
  n8n-data:
    external: true
  postgres-data:
    external: true
  redis-data:
    external: true
  letsencrypt:
    external: true
  prometheus-data:
    external: true
  grafana-data:
    external: true
queue-mode/.env
# ================================================
#    ENV VARIABLES FOR QUEUE MODE & MONITORING
# ================================================

# -------- DOMAIN & FQDN --------
DOMAIN=example.com
SUBDOMAIN_N8N=n8n
SUBDOMAIN_GRAFANA=grafana
SUBDOMAIN_PROMETHEUS=prometheus
SSL_EMAIL=you@example.com
GENERIC_TIMEZONE=Asia/Ho_Chi_Minh

N8N_FQDN=${SUBDOMAIN_N8N}.${DOMAIN}
GRAFANA_FQDN=${SUBDOMAIN_GRAFANA}.${DOMAIN}
PROMETHEUS_FQDN=${SUBDOMAIN_PROMETHEUS}.${DOMAIN}

# -------- IMAGE & RUNTIME --------
N8N_IMAGE_TAG=latest
NODE_ENV=production
N8N_LOG_LEVEL=info
N8N_DIAGNOSTICS_ENABLED=false
N8N_BLOCK_ENV_ACCESS_IN_NODE=true

# -------- n8n URLS --------
N8N_PORT=5678
N8N_PROTOCOL=https
N8N_HOST=${N8N_FQDN}
WEBHOOK_URL=https://${N8N_FQDN}
N8N_EDITOR_BASE_URL=https://${N8N_FQDN}
N8N_PUBLIC_API_BASE_URL=https://${N8N_FQDN}
N8N_SECURE_COOKIE=true
# Enable Metrics
N8N_METRICS=true
N8N_METRICS_INCLUDE_QUEUE_METRICS=true

# -------- SECURITY & SECRETS --------
# Generate with: openssl rand -base64 16
POSTGRES_PASSWORD=CHANGE_ME_BASE64_16_BYTES
# Generate with: openssl rand -base64 16
REDIS_PASSWORD=CHANGE_ME_BASE64_16_BYTES
# Generate with: openssl rand -base64 32
N8N_ENCRYPTION_KEY=CHANGE_ME_BASE64_32_BYTES
# Generate with: openssl rand -base64 16
N8N_BASIC_AUTH_PASSWORD=CHANGE_ME_BASE64_16_BYTES

N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
N8N_BASIC_AUTH_ACTIVE=true
N8N_BASIC_AUTH_USER=admin
# N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}

# -------- DATABASE SETTINGS --------
DB_TYPE=postgresdb
DB_POSTGRESDB_HOST=postgres
DB_POSTGRESDB_PORT=5432
DB_POSTGRESDB_DATABASE=n8n
DB_POSTGRESDB_USER=n8n
DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}

POSTGRES_USER=n8n
# POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
POSTGRES_DB=n8n

# -------- QUEUE MODE --------
EXECUTIONS_MODE=queue
QUEUE_BULL_REDIS_HOST=redis
QUEUE_BULL_REDIS_PORT=6379
QUEUE_BULL_REDIS_PASSWORD=${REDIS_PASSWORD}
OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true
QUEUE_HEALTH_CHECK_ACTIVE=true

# Workers scaling
N8N_WORKER_CONCURRENCY=5
N8N_WORKER_SCALE=2

# -------- EXTERNAL RUNNERS --------
N8N_RUNNERS_ENABLED=true
N8N_RUNNERS_MODE=external
N8N_RUNNERS_BROKER_LISTEN_ADDRESS=0.0.0.0
N8N_RUNNERS_MAX_CONCURRENCY=5
N8N_RUNNERS_AUTH_TOKEN=${N8N_BASIC_AUTH_PASSWORD}

# -------- EXECUTION SETTINGS --------
EXECUTIONS_TIMEOUT=3600
EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=336
EXECUTIONS_RETRY_MAX=3

# -------- MONITORING STACK --------
COMPOSE_PROJECT_NAME=n8n
COMPOSE_PROFILES=
TRAEFIK_USERSFILE=/etc/traefik/htpasswd
MONITORING_BASIC_AUTH_USER=admin
# Grafana admin password and Traefik’s Basic Auth
MONITORING_BASIC_AUTH_PASS=StrongPass@123
EXPOSE_PROMETHEUS=false

# -------- END OF CONFIG --------

5. Launch the n8n stack

  • Create the htpasswd file on the host
# Create the directory (on the HOST)
sudo install -d -m 0750 /etc/traefik

# Create the file with your first user (bcrypt)
# You'll be prompted for the password
sudo apt-get update && sudo apt-get install -y apache2-utils
sudo htpasswd -cB /etc/traefik/htpasswd admin

# Tighten permissions
sudo chmod 640 /etc/traefik/htpasswd
  • Update .env file
# Navigate to project directory
cd /home/n8n

# Generate strong secrets (paste results into .env):
# 16-byte base64 for passwords
openssl rand -base64 16
# 32-byte base64 for encryption key
openssl rand -base64 32

# Update information to /home/n8n/.env
nano .env

# Base & subdomains
DOMAIN=example.com
SUBDOMAIN_N8N=n8n
SUBDOMAIN_GRAFANA=grafana
SUBDOMAIN_PROMETHEUS=prometheus

# (Optional, if your compose reads them directly)
N8N_FQDN=n8n.${DOMAIN}
GRAFANA_FQDN=grafana.${DOMAIN}
PROMETHEUS_FQDN=prometheus.${DOMAIN}

# Let's Encrypt email
SSL_EMAIL=you@example.com

# n8n version tag (pin if you want a fixed version)
N8N_IMAGE_TAG=latest

# Enable monitoring profile
COMPOSE_PROFILES=monitoring
# Public Prometheus? (true/false)
EXPOSE_PROMETHEUS=false

# Strong secrets — generate new ones below (don’t use these examples)
POSTGRES_PASSWORD=<base64-16>
N8N_BASIC_AUTH_PASSWORD=<base64-16>
N8N_ENCRYPTION_KEY=<base64-32>

# Traefik Basic Auth protecting Grafana/Prometheus
TRAEFIK_USERSFILE=/etc/traefik/htpasswd

# Then save the .env file
  • Validate the docker compose and bring stacks up
# Create a directory called local-files for sharing files between the n8n instance and the host system
mkdir -p ./local-files
# Let your host user own the folder; n8n runs as user 1000 in the container
chown -R ${SUDO_USER:-$USER}:${SUDO_USER:-$USER} ./local-files
chmod 755 ./local-files

# Validate YAML & env expansion first
docker compose config

# Pull images (optional but recommended)
docker compose pull

# Manual create volume
docker volume create n8n-data
docker volume create postgres-data
docker volume create letsencrypt

# Monitoring
docker volume create grafana-storage
docker volume create prometheus-data
# If you choose queue mode, add:
docker volume create redis-data

# Seed the ACME file inside letsencrypt
docker run --rm -v letsencrypt:/data alpine \
  sh -c "touch /data/acme.json && chmod 600 /data/acme.json"

# Confirm all volumes created
docker volume ls | grep -E 'n8n-data|postgres-data|letsencrypt|grafana-storage|prometheus-data|redis-data'

# Start n8 stacks
docker compose up -d
  • Health Checks & Sanity Testing
# Check if all Docker containers running and healthy
docker ps -a --format 'table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}'

# If any container is not healthy, check the container logs

docker compose logs -f traefik
docker compose logs -f n8n
docker compose logs -f postgres
docker compose logs -f grafana
docker compose logs -f prometheus
docker compose logs -f redis     # if used

# Or: One-shot (no follow), grouped by container:
docker ps -q | xargs -I{} sh -c 'echo "===== {} ====="; docker logs --tail=200 {}'

# Or: Live stream for every running container, merged with name prefix:
# This will prefix each line with the container name, making merged logs easier to read.
for c in $(docker ps -q); do
    n=$(docker inspect -f '{{.Name}}' "$c" | sed 's#^/##')
    docker logs -f --tail=0 "$c" | sed "s/^/[$n] /" &
done; wait

6. Accessing Dashboards

Login to Grafana at https://grafana.domain.com with your basic auth.

  • First prompt: Traefik basic auth (user/password from secrets/htpasswd).
  • Then Grafana login: admin / ${GRAFANA_ADMIN_PASSWORD}.
  • In Grafana, go to Dashboards → Import, search by ID or paste JSON.

Recommended dashboards:

ServiceDashboard NameGrafana Dashboard IDLink
n8nn8n Queue Mode (Full)(see JSON under dashboard directory)
TraefikTraefik v2 Metrics12250Traefik 2.x Dashboard
PostgreSQLPostgreSQL Database9628Postgres Exporter Dashboard
RedisRedis Dashboard for Prometheus763Redis Dashboard
Node ExporterNode Exporter Full1860Node Exporter Full
cAdvisorcAdvisor Exporter Dashboard14282cAdvisor Dashboard

How to import:

  1. Go to Grafana → Dashboards → Import.
  2. Enter the Dashboard ID from the table above.
  3. Select your Prometheus datasource.
  4. Click Import.
Example how to import a Grafana Dashboard
how to import a Grafana Dashboard

Automating n8n Monitoring with Script

If you don’t want to stitch Compose files and provisioning by hand, use the n8n Manager script to install, secure, and operate the whole stack (n8n + Prometheus + Grafana + exporters) in one go. It follows the same architecture in this guide and bakes in health checks, TLS, and sane defaults.

What the script automates

  • Install / Upgrade the full stack on Ubuntu (Docker + Compose, volumes, pinned tags).
  • Monitoring profile: Prometheus, Grafana, Node Exporter, cAdvisor, Postgres/Redis exporters, Traefik metrics.
  • Security: Let’s Encrypt TLS via Traefik, dashboards behind Basic Auth, Prometheus kept private (opt-in to expose).
  • Mode-aware: single or queue mode
  • Ops workflows: backup (local + optional rclone to Drive), restore, and cleanup with safety checks.

Command line overview:

./n8n_manager.sh -h
Usage: ./n8n_manager.sh [ONE ACTION] [OPTIONS]

Actions (choose exactly one):
  -a, --available
        List available n8n versions

  -i, --install <DOMAIN>
        Install n8n with the given base domain (e.g., example.com)
        Optional: --mode single|queue  (default: single)
        Optional: -v|--version <tag>

  -u, --upgrade
        Upgrade n8n to target version (or latest). Domain/FQDNs are read from .env.

  -b, --backup
        Run backup (skip if no changes unless -f)

  -r, --restore <FILE_OR_REMOTE>
        Restore from local file or rclone remote (e.g. gdrive:folder/file.tar.gz)

  -c, --cleanup [safe|all]  Stop stack & remove resources (preview; confirm in 'all')

Options:
  --mode <single|queue>     (install only; default: single)
  -v, --version <tag>       Target n8n version (default: latest stable)
  -m, --ssl-email <email>   LE certificate email (install/upgrade)
  -d, --dir <path>          Target n8n directory (default: /home/n8n)
  -l, --log-level <LEVEL>   DEBUG | INFO (default) | WARN | ERROR
  -f, --force               Upgrade: allow downgrade or redeploy; Backup: force even if unchanged
  -e, --email-to <email>    Send notifications to this address (requires SMTP_USER/SMTP_PASS env)
  -n, --notify-on-success   Also email on success (not just failures)
  -s, --remote-name <name>  rclone remote root (e.g. gdrive-user or gdrive-user:/n8n-backups)
  -h, --help                Show this help

# Monitoring-related (install-time):
  --monitoring                        Enable Prometheus/Grafana profile
  --expose-prometheus                 Expose Prometheus publicly (default: private)
  --subdomain-n8n <sub>               Override n8n subdomain (default: n8n)
  --subdomain-grafana <sub>           Override Grafana subdomain (default: grafana)
  --subdomain-prometheus <sub>        Override Prometheus subdomain (default: prometheus)
  --basic-auth-user <user>            Traefik basic auth user for Grafana/Prometheus
  --basic-auth-pass <pass>            Traefik basic auth pass for Grafana/Prometheus

Examples:
  ./n8n_manager.sh -a
      # List available versions

  ./n8n_manager.sh --install example.com -m you@example.com
      # Install the latest n8n version with single mode

  ./n8n_manager.sh --install example.com -m you@example.com -v 1.105.3 --mode queue
      # Install a specific n8n version with queue mode

  ./n8n_manager.sh --install example.com -m you@example.com -d /path/to/n8n --mode queue
      # Install the latest n8n version (queue mode) to a specific target directory

  ./n8n_manager.sh --install example.com -m you@example.com --mode queue --monitoring --basic-auth-user admin --basic-auth-pass 'StrongPass@123'
      # Install the latest n8n version (queue mode) with monitoring (Grafana + Prometheus)

  ./n8n_manager.sh --upgrade
      # Upgrade to the latest n8n version (domain/FQDNs read from .env)

  ./n8n_manager.sh --upgrade -f -v 1.107.2
      # Upgrade to a specific n8n version

  ./n8n_manager.sh --backup --remote-name gdrive-user --email-to ops@example.com --notify-on-success
      # Backup and upload to Google Drive, notify via email

  ./n8n_manager.sh --restore backups/your_backup_file.tar.gz
      # Restore with the tar.gz file at local

Install n8n single mode with monitoring

Use the script n8n_manager.sh to deploy the latest n8n single mode plus the full observability stack in one shot.

# Install the latest n8n version (single mode) with monitoring
./n8n_manager.sh --install example.com -m you@example.com --mode single --monitoring --basic-auth-user admin --basic-auth-pass "StrongPass@123"

What it deploys:

  • n8n (single container) behind Traefik (HTTPS + Basic Auth for dashboards)
  • PostgreSQL
  • Prometheus (private), Grafana (public behind Basic Auth)
  • Exporters: Node Exporter, cAdvisor, Postgres Exporter, Traefik metrics.
# Logs after the installation finishs:
═════════════════════════════════════════════════════════════
N8N has been successfully installed!
Installation Mode:       single
Domain (n8n):            https://n8n.example.com
Grafana:                 https://grafana.example.com
Prometheus:              (internal only)
Installed Version:       1.111.0
Install Timestamp:       2025-09-09_23-41-53
Installed By:            root
Target Directory:        /home/n8n
SSL Email:               you@example.com
Execution log:           /home/n8n/logs/install_n8n_2025-09-09_23-41-53.log
═════════════════════════════════════════════════════════════

Access to the https://grafana.example.com with user=admin; password is=StrongPass@123

n8n single mode observability grafana

Install n8n queue mode with monitoring

Deploys n8n main and workers backed by Redis (BullMQ) with full observability.

# Install the latest n8n version (queue mode with 2 workers) with monitoring
./n8n_manager.sh --install example.com -m you@example.com --mode queue --monitoring --basic-auth-user admin --basic-auth-pass "StrongPass@123"

What it deploys:

  • n8n Main (editor/API) + 2 n8n Workers (execute jobs)
  • Redis (BullMQ) for the queue
  • PostgreSQL
  • Prometheus (private), Grafana (public behind Basic Auth)
  • Exporters: Node, cAdvisor, Postgres Exporter, Redis Exporter, Traefik metrics
# Logs after the installation finishs:
═════════════════════════════════════════════════════════════
N8N has been successfully installed!
Installation Mode:       queue
Domain (n8n):            https://n8n.example.com
Grafana:                 https://grafana.example.com
Prometheus:              (internal only)
Installed Version:       1.111.0
Install Timestamp:       2025-09-09_15-22-05
Installed By:            root
Target Directory:        /home/n8n
SSL Email:               you@example.com
Execution log:           /home/n8n/logs/install_n8n_2025-09-09_15-22-05.log
═════════════════════════════════════════════════════════════

Read more about n8n queue mode manual setup here.

Grafana Dashboards

Access to the https://grafana.example.com with user=admin; password is=StrongPass@123

Grafana Dashboard: n8n – Full Health and Performance

This board gives operators a single place to confirm service health, reliability, performance, and capacity—and it points directly to the next action: example fix a failing workflow, add workers, or add resources.

n8n Grafana Dashboard

What it shows (at a glance):

  • n8n Up / Versions / Uptime – sanity checks: service is up, which n8n and Node.js versions are running, and for how long.
  • Active Workflows / Active Executions – current concurrency; useful to confirm traffic vs idle periods.
  • Queue Backlog (waiting) – number of jobs waiting in Redis/BullMQ. Zero is healthy; growing lines mean workers can’t keep up.
  • Event Loop Lag – Node.js responsiveness in ms. Sustained >50–100 ms suggests heavy GC, blocking tasks, or CPU contention.
  • RSS Memory – resident memory of the n8n process. Rising without relief can indicate a leak or oversized executions.
  • Open File Descriptors – OS handles in use; unexpected growth can hint at runaway connections/files.

Time-series you’ll use to diagnose:

  • Queue throughput (completed / failed per second) – are workers actually draining the queue? Spikes in failed/s flag workflow or dependency issues.
  • Execution success ratio (5m) – reliability SLO; drops mean user-visible failures (investigate top failing workflows).
  • Queue backlog by instance – isolates which n8n instance (main/workers) is bottlenecked.
  • API error rate (5xx per second) – edge symptoms (often Traefik/API/backend problems). If 5xx rises while backlog is flat, check integrations and credentials.
  • Queue state (active / delayed) – whether jobs are actively processing or getting delayed/retried.
  • CPU usage (cores) – container CPU; plateaus near vCPU limits mean you should scale workers or raise resources.

Grafana Dashboard: Node Exporter Full

A complete view of your VPS health—CPU, memory, disks, filesystems, and network—sourced from node_exporter on :9100. Use this board to decide whether an incident is infrastructure pressure (host-level) or an app/container issue (then drill into cAdvisor and the n8n dashboards).

Node Exporter Full grafana

How to read it fast

  • CPU & Load: sustained load > core count or CPU busy >75% ⇒ CPU-bound.
  • Memory: any sustained swap used > 0 ⇒ RAM pressure.
  • Disk: filesystem >80% or rising iowait/latency ⇒ storage bottleneck.
  • Network: errors/drops/retransmits > 0 ⇒ NIC or upstream issues.

Triage rule:
If Node Exporter is green but n8n is red → look at workflow errors, Redis queue depth, or Postgres locks.
If Node Exporter is red → scale/optimize the host first (more vCPU/RAM, faster disk, free space).

Grafana Dashboard: cAdvisor

Real-time container-level telemetry from cAdvisor (:8080/metrics), scraped by Prometheus. It answers which container is the bottleneck when the host looks fine.

cAdvisor Grafana Dashboard
  • CPU Usage (per container): spot “noisy neighbors” (e.g., n8n-main, workers, postgres, prometheus spikes).
  • Memory Usage & Cached: working set growth vs file cache; sustained growth in n8n-main/workers hints at leaks or oversized payloads.
  • Network (rx/tx): per-container throughput to catch chatty services or failing retries.
  • Containers table: label, service, image tag, uptime/running state—great for confirming versions and restart loops.

Grafana Dashboard: Traefik

Metrics from Traefik’s /metrics on :8081. This board tells you if problems start at the edge (routing/TLS/proxy) or deeper in the stack.

Grafana Dashboard Traefik
  • Average response time by service – end-to-end latency from Traefik to each upstream (n8n@docker, grafana@docker, etc.).
  • Requests by service / by protocol – traffic split across services and entrypoints (websecure, web, metrics, traefikping).
  • Status codes over 5m – stacked bars: 2xx, 3xx, 4xx, 5xx; plus “others status code” for non-200s.
  • Service focus panels – per-service response time, return-code pie, and total requests (great for drilling into n8n vs grafana).

Resources

Conclusion

With Prometheus and Grafana, you get the three essentials of observability for n8n: metrics, dashboards, and alerts. Start with the lean stack here, watch a week of real traffic, then tune thresholds and panels. When you need speed and repeatability, flip to the n8n-toolkit automation and let it handle installs, upgrades, backups, restores, and monitoring for you.

If you’d like a done-for-you setup (install, alert tuning, SLOs, dashboards for leadership), I can help.

Similar Posts

  • Self-Hosting n8n Made Easy: The Complete Guide to Install, Upgrade, Backup, Restore and Monitoring

    If you already know about n8n and want to deploy it on your own server (like a VPS) for greater flexibility, privacy, and scalability, this guide is for you. Maybe you’ve asked yourself: “What’s the best way to install n8n for production on Ubuntu?” — Docker, Traefik, PostgreSQL, SQLite, Nginx… it can get confusing fast….

  • Scaling n8n with Queue Mode: Automated Deployment Script

    When your n8n workflows start to grow—more triggers, heavier jobs, and higher execution volumes—the default single-process mode can quickly become a bottleneck. Long-running workflows may slow down the editor, block new executions, or even crash under heavy load. Queue Mode fixes that by separating responsibilities: the main n8n instance stays responsive for editing, webhooks, and…

Leave a Reply

Your email address will not be published. Required fields are marked *