Everything you need to keep your services up

Eight things that matter, built the way you'd build them yourself if you had a few weekends. No PagerDuty-sized feature surface — just the parts that actually catch outages.

Probing

Multi-location consensus

Every monitor runs from the probe locations you choose — Phoenix, Buffalo, Dallas, Ukraine, and (on Enterprise) any region you want us to add. All locations check the same target on the same schedule. The monitor only flips to DOWN when every selected probe agrees the target is unreachable.

That single rule kills the entire category of false page-outs caused by one bad backbone route, a flapping Cogent peer, or a temporarily blocked datacenter IP. Other monitors will page you at 3 a.m. for a single Frankfurt timeout; Duck Stats notices three other probes are fine and stays quiet.

  • Pick which locations check which target — internal-only API from Phoenix, public DNS from all five.
  • See per-probe latency graphs overlaid for the last 24 hours — spot a single bad region instantly.
  • Dissenting probe results are still logged and shown on the monitor detail page for diagnostics.
  • Set required-quorum manually if you want stricter (3-of-5) or looser (2-of-5) rules.
  • No extra cost — consensus is the default behavior on Starter and up.
Phoenix200 OK · 42ms
Buffalo200 OK · 71ms
Dallas200 OK · 55ms
Ukrainetimeout
Phoenix-2200 OK · 39ms
Verdict: UP · 4-of-5 reachable · Ukraine route logged for review
Server health

The server agent that isn't a daemon

One POSIX shell script. No Go binary, no systemd unit to debug, no Docker container to keep updated. curl https://stats.duckingstats.com/agent.sh | TOKEN=xxx sh drops a file in /usr/local/bin, registers a cron entry, and starts pushing every 60 seconds. It runs on anything that has sh, awk, and curl — that's every Linux box made since 2003 and most BSDs.

What it pushes: CPU usage, load average (1/5/15), RAM used + cached + buffers, swap, disk usage per mounted filesystem, network throughput per interface, optional process-pattern checks, and optional systemctl is-active service checks. CPU and disk pressure are info-only — they color the host yellow on the dashboard but they will never trigger a DOWN alert by themselves.

  • Per-host custom thresholds — a build box at 90% CPU is fine, a database at 90% is not.
  • Per-partition disk tracking — track a 12-disk NAS without 12 separate monitors.
  • Process check by pattern (pgrep -f nginx) with the last-seen timestamp in alert bodies.
  • Service check via systemctl is-active — catches crashes that don't change the process name.
  • Self-updates by re-curling itself on a weekly schedule; pin a version if you don't want that.
# cat /etc/cron.d/duckstats-agent
* * * * * root /usr/local/bin/duckstats-agent push

# duckstats-agent --once | head -8
host=db01.example.com
cpu=14.2% load1=0.31
ram_used=4.2G/16G
disk_/=68% disk_/data=41%
net_eth0_rx=1.2MB/s tx=890KB/s
proc_mysqld=running pid=1421
svc_mariadb=active
Pushed in 38ms · next push in 60s
Public pages

Status pages with custom domains

Pick which monitors appear, group them under section headings, drop your logo, choose a color, and point a CNAME at our edge. We terminate TLS for free with a Let's Encrypt cert and serve the page in under 200ms from a static cache that's invalidated whenever a monitor changes state.

Premium accounts can run as many status pages as they want — separate ones for internal infra, customer-facing services, and partner integrations all share the same monitors but show different subsets. Each page gets its own URL, branding, and (optionally) password protection.

  • Custom subdomain (status.yourcompany.com) with automatic Let's Encrypt TLS.
  • Per-page monitor selection — show only what each audience needs to see.
  • Embedded latency overlay with one line per probe location.
  • 90-day uptime ribbons + per-monitor incident history.
  • JSON endpoint at /status/<slug>.json for embedding in your own dashboards.
api.yourco.comUP · 99.98%
app.yourco.comUP · 100.00%
cdn.yourco.comDEGRADED · 99.4%
mail.yourco.comUP · 99.91%
All systems operational (1 degraded)
Served from status.yourco.com · CNAME → edge.duckingstats.com
Alerts

Six notification channels, one consensus filter

Email, Discord, Pushover, ntfy, Microsoft Teams, and generic webhooks. Mix and match per monitor or per tag — page the on-call's phone via Pushover, drop a fancy embed in the team Discord, and CC the audit mailbox over SMTP, all from one configured monitor.

Every channel goes through the same throttle: a configurable minimum interval between alerts per monitor, and an UP-only resend on recovery. Loud failures, quiet flaps. The webhook channel sends a full JSON envelope including the failing probe location, latency, response body snippet, and a deep link back to the monitor detail page.

  • Native Discord rich embeds (color-coded UP/DOWN, latency, last-seen timestamp).
  • Email via Oracle Cloud SMTP — same path as the support inbox, won't get caught in your SPF.
  • Pushover priority routing — info goes silent, critical goes screaming.
  • ntfy support for self-hosted push (point at your own ntfy server).
  • Webhook payloads include the raw probe response — debug from the alert, not the dashboard.
{
  "monitor": "api.example.com",
  "state": "DOWN",
  "consensus": "5-of-5 unreachable",
  "first_probe_down": "phoenix",
  "last_probe_down": "ukraine",
  "duration_ms": 23741,
  "response_snippet": null,
  "url": "https://stats.duckingstats.com/m/abc123"
}
DOWN alert · routed to: Discord + Pushover + email
Severity

Multi-tier alerting (warn vs critical, suppression)

Every monitor has two thresholds: warn (degraded — latency above N ms, partial probe agreement, host metric above soft cap) and critical (consensus DOWN, hard cap exceeded). Warn never wakes you up — it goes only to channels you mark as "info-tier" and shows yellow on the dashboard. Critical fires the loud channels.

Suppression windows mute alerts during planned maintenance — set a one-shot window from the UI or via the API, and the monitor will keep running but won't notify until the window expires. You'll still get the post-mortem timeline showing exactly what happened during the suppression.

  • Two severity tiers, independently routed.
  • Planned-maintenance suppression with auto-expiry.
  • Per-monitor minimum repeat interval (don't re-page every minute on a sustained outage).
  • "Recovered" notification on UP transitions, with downtime duration in the body.
  • CPU/disk pressure is always info-tier — it never escalates to a DOWN alert by design.
latency 850ms (limit 500)WARN → Discord
3-of-5 probes timeoutWARN → Discord
5-of-5 probes DOWNCRITICAL → all
disk /data at 92%INFO → Discord
maintenance windowSUPPRESSED · 14m left
Warn channels: Discord · Critical channels: Pushover + email + Discord
API

Auto-instrument from CI

Every monitor, status page, and notification channel is a REST resource. The API uses bearer tokens scoped per-account and returns plain JSON with predictable pagination. Token-bucket rate limit (60 req/s on Premium, 600 req/s on Enterprise) — high enough that you'd have to be trying to hit it.

The pattern that pays off most: a CI step that creates a monitor for every new service it deploys. Push a new container, watch the API create an HTTP monitor pointed at the new ingress URL, tag it with the service name and environment, and add it to the right status page — all in 200 lines of CI YAML. Tear down the monitor in the same pipeline when you decommission.

  • Create / read / update / delete monitors, status pages, notification channels, and tags.
  • Bulk endpoints — push 100 monitor configs in one POST.
  • Read-only access on Starter, full read+write on Premium and up.
  • OpenAPI spec at stats.duckingstats.com/openapi.yaml — generate clients for any language.
  • Webhook callbacks on monitor state changes for downstream automation.
$ curl -H "Authorization: Bearer $DS_TOKEN" \
    https://stats.duckingstats.com/api/monitors \
    -d '{
      "name":"api.new-service.com",
      "type":"http",
      "url":"https://api.new-service.com/healthz",
      "interval_s":60,
      "probes":["phoenix","buffalo","dallas"],
      "tags":["api","prod"]
    }'

{"id":"mon_8x7v","created":"2026-06-21T14:32:01Z"}
Monitor created · first check in 18s
History

Heartbeat history + uptime rollups

Every individual check result — raw heartbeat, latency, which probe ran it, response detail — is kept for 7 days. Minutely stats roll up and stick around for 30 days. Hourly rollups last a full year. Daily rollups stay forever. That means you can answer "what was uptime in Q3 last year?" without paying for a separate analytics product.

The monitor detail page lets you scrub any window in the retention range and see every probe result, latency line, and notification fired. Status pages compute uptime percentages on the fly from the right rollup tier — fast queries for any window from "last hour" to "last 5 years".

  • 7 days raw heartbeats · 30 days minutely · 1 year hourly · daily forever.
  • Per-probe latency series on every monitor — see which region degraded when.
  • Status pages can show 90-day uptime ribbons computed from real history, not synthesized.
  • API endpoint returns raw or rollup tiers — pull into Grafana, Prometheus, or your own warehouse.
  • No surprise pruning — Free tier gets the same retention as Premium.
monitor: api.example.com · window: last 7 days

uptime  : 99.97% (3,019 / 3,020 checks UP)
incidents: 1
  └─ 2026-06-18 04:12 → 04:14 UTC (2m 09s)
     5-of-5 DOWN · resolved automatically

p50 latency: 48ms   p95: 112ms   p99: 287ms
worst probe : ukraine (p95 144ms)
best  probe : phoenix (p95  88ms)
Rollup tier: minutely · query took 12ms
Push

Push monitors for cron jobs

For things that aren't reachable from the outside — nightly backups, batch jobs, ETL runs — create a push monitor and have the job POST a heartbeat URL on success. If the heartbeat doesn't arrive within the expected window, the monitor flips DOWN like any other check and fires the configured alerts.

Healthchecks.io users will recognize this pattern. The difference: push monitors live in the same dashboard as your HTTP / TCP / agent monitors, share the same notification routing, show up on the same status pages, and bill against the same monitor count. One product, not two.

  • Configurable grace period — late by N minutes before alerting.
  • Optional body capture — POST job log output and see it in the heartbeat history.
  • Cron expression parsing — the UI tells you when the next check is expected.
  • Works with one-line additions to any cron entry: && curl -fsS $URL.
  • Same alert channels and consensus suppression as every other monitor type.
# /etc/cron.d/nightly-backup
0 3 * * * root /usr/local/bin/backup.sh \
  && curl -fsS https://stats.duckingstats.com/ping/push_abc123

# heartbeat history
2026-06-21 03:00:42 UTC  ✓ 42ms
2026-06-20 03:00:39 UTC  ✓ 39ms
2026-06-19 03:00:48 UTC  ✓ 48ms
2026-06-18 03:00:00 UTC  ✗ MISSED (grace 5m)
Next expected: tomorrow 03:00 UTC ± 5m

At-a-glance vs the alternatives

Honest table. We're a small, opinionated service — we don't try to match the feature surface of the giants. Where they win, we mark it. Where we win, we mark that too.

Feature Duck Stats HetrixTools UptimeRobot Better Stack
Free tier15 monitors15 monitors50 monitors10 monitors
1-min checks on entry paid plan$5$5–10$7$29
Multi-location consensus (DOWN only if all probes agree)Built-in defaultMulti-loc, not consensusMulti-loc, not consensusMulti-loc, not consensus
Server health agentPOSIX shell, no daemonOwn binary agentNot offeredVia Logtail (separate product)
Process & service monitoringpgrep + systemctlProcess listVia Logtail
Push monitors (cron heartbeats)
Custom-domain status pagesPremium+, free TLSBusiness+Pro+All paid
Number of probe locations5 (Enterprise: 10+)35+1415+
SMS alertsEmail/Discord/push onlyPaid add-onPaid plansIncluded
Phone-call escalation
On-call rotation / incident managementFull
SSL / domain expiry monitoringVia TCP checksBuilt-inBuilt-inBuilt-in
API access on entry paid planRead-only @ $5Limited
Self-hosted optionEnterprise

Last reviewed June 2026 against public pricing pages. Plans change — verify with each provider before deciding. If you need SMS, phone-call escalation, or PagerDuty-style on-call rotations, Better Stack is the right choice — it's a different product class. If you need a global probe footprint (more than 5 regions), HetrixTools wins on geographic breadth. Where we beat everyone: multi-location consensus by default, a server agent that's literally just a shell script, and a $5 entry tier that already includes 1-minute checks and the server agent.

See it live, then sign up

Our own production status page runs on Duck Stats. Watch it for two minutes and you've seen everything above in action.

See it live → Start free