Monitoring
Purpose
Section titled “Purpose”Describe how Get2Dial services expose health and metrics, and what to watch.
Overview
Section titled “Overview”The control plane exposes three unauthenticated endpoints:
GET /healthz— liveness.GET /readyz— readiness; aggregates the Postgres, Redis and NATS health checkers registered at boot.GET /metrics— Prometheus metrics (the middleware chain is CORS → request-logger with trace ID → metrics instrumentation → mux).
There is also GET /api/v1/edge/health, a public aggregator that probes the
edge stack (control, callengine, nodeagent, OpenSIPS, rtpengine, FreeSWITCH) via
the EDGE_HEALTH_* URLs.
Configuration
Section titled “Configuration”Container health checks hit the served endpoints, e.g.:
healthcheck: test: ["CMD", "wget", "--spider", "--quiet", "http://localhost:8080/healthz"] interval: 10s timeout: 5s retries: 5Set LOG_LEVEL (default info) and LOG_FORMAT (default json).
Examples
Section titled “Examples”Key signals to alert on:
/readyzfailing — a dependency (Postgres/Redis/NATS) is down.- Dialer pacing stalls (no originates while leads remain) — often a
NODE_IDmismatch between pacer target and edge subscription. - NATS disconnects between control plane and an edge.
- Abandonment rate approaching a campaign’s
abandonment_rate_cap.
- A trace ID is propagated end to end so a single call can be followed across services.
- Treat the migration version as a first-class health signal after upgrades.