Heartbeat Monitoring

Your crons are failing.
Silently.

A heartbeat check is a dead-man's switch for your scheduled jobs. Your job pings a URL when it finishes. If the ping never arrives — process crashed, server rebooted, job hung — SteadyCron wakes you up.

Start free Read the docs

Free tier — no credit card required.

Without a heartbeat check

# /etc/cron.d/backup
0 2 * * * root /usr/bin/backup.sh

# Output: none.
# Exit code: who knows.
# Last successful run: unclear.
# Will anyone notice? No.

With a heartbeat check

# /etc/cron.d/backup
0 2 * * * root /usr/bin/backup.sh \
  && curl -fsS https://ping.steadycron.com/abc123

✓ 02:04 — ping received, job healthy
✗ 03:02 — no ping — alerting #ops-slack

The problem

Fire-and-forget is not a monitoring strategy

Cron has no built-in concept of success or failure. A job runs, produces output no one reads, and exits — and the scheduler considers its job done regardless. By the time you notice something went wrong, the logs have rotated, the window has closed, and the damage is done.

The nightly backup silently failed. The data is gone.
The invoice job hasn't run in three days. Finance noticed first.
A script hung and blocked the next run — for six hours.
The process was OOM-killed. Logs already rotated.

A typical cron failure timeline

02:00

Job scheduled to run

02:00

Process started

02:03

Disk full — script exits 1

02:03

Output: none (redirected)

03:00

Log rotation runs

09:14

Engineer opens dashboard

09:14

No visible errors anywhere

14:30

Customer reports data loss

12 hours between failure and discovery.

How it works

Silence is the signal

You don't push data to us — your job does. If it stops, we know.

01

Your job pings a URL

After your script runs, it calls a unique ping URL — nothing more than a single HTTP GET. Pass /start before, /success after, and optionally /fail on error.

curl -fsS https://ping.steadycron.com/{id}

02

We open a grace window

SteadyCron knows your schedule and allows a configurable grace period to absorb normal jitter — midnight maintenance windows, server load spikes, and schedule drift.

grace: 10m # no alert if ping arrives within 10 min

03

Silence fires an alert

If the ping doesn't arrive before the grace period expires, an alert is sent to your configured channel — Slack, email, Telegram, Discord, or webhook.

→ #ops-slack nightly-backup missed (14 h overdue)

What we detect

Three ways a job can go wrong

Each failure type is distinct, detectable, and alerts independently.

nightly-backup MISSED

Missed run

The job never started. The process was killed, the server rebooted, or the scheduler skipped it. No ping arrived — not even a /fail.

Expected at 02:00 UTC

Last ping yesterday 02:04

Overdue by 14 h 22 min

Detected when: no /start or /success ping arrives within the grace window of the expected schedule.

invoice-sender FAILED

Job failed

The job ran but exited non-zero, or your code caught an exception and explicitly called the /fail endpoint before returning.

...
$ python invoice.py
ConnectionError: DB unreachable
→ /fail ping sent
Alerted: #billing-alerts

Detected when: your code calls the /fail endpoint, or the job exits without a /success ping.

report-generator TIMED OUT

Hung job

The job started — /start was received — but never sent /success. It's still running, or it deadlocked silently. A max-duration threshold catches it.

/start received 09:00:12

Max duration 30 min

/success received never

Detected when: a /start was received but /success or /fail never arrives within the configured max duration.

Integration

One line. Any language.

No SDK to install, no agent to deploy. An HTTP GET is the integration. Works from a shell script, a Python lambda, a Go binary, or a Kubernetes CronJob.

# Minimal — one line at the end of your crontab command
0 2 * * * /usr/bin/backup.sh && curl -fsS https://ping.steadycron.com/abc123

# With start/fail signals (recommended)
0 2 * * * curl -fsS https://ping.steadycron.com/abc123/start \
  && /usr/bin/backup.sh \
  && curl -fsS https://ping.steadycron.com/abc123 \
  || curl -fsS https://ping.steadycron.com/abc123/fail

import requests, subprocess, sys

PING = "https://ping.steadycron.com/abc123"

requests.get(f"{PING}/start", timeout=5)
result = subprocess.run(["/usr/bin/backup.sh"])

if result.returncode == 0:
    requests.get(PING, timeout=5)
else:
    requests.get(f"{PING}/fail", timeout=5)
    sys.exit(result.returncode)

const PING = "https://ping.steadycron.com/abc123";

await fetch(`${PING}/start`);
try {
  await runMyJob();
  await fetch(PING);             // success
} catch (err) {
  await fetch(`${PING}/fail`);   // explicit failure
  throw err;
}

const ping = "https://ping.steadycron.com/abc123"

http.Get(ping + "/start")

if err := runMyJob(); err != nil {
    http.Get(ping + "/fail")
    log.Fatal(err)
}

http.Get(ping) // success

require "net/http"
PING = "https://ping.steadycron.com/abc123"

Net::HTTP.get(URI("#{PING}/start"))
begin
  run_my_job
  Net::HTTP.get(URI(PING))          # success
rescue => e
  Net::HTTP.get(URI("#{PING}/fail"))
  raise
end

https://ping.steadycron.com/abc123";

file_get_contents(PING . "/start");
try {
    run_my_job();
    file_get_contents(PING);          // success
} catch (\Throwable $e) {
    file_get_contents(PING . "/fail");
    throw $e;
}

The ping URL is generated when you create the check — copy it into your script. Prefer to define monitors as code? Declare them in YAML or Terraform.

Smart alerting · Grace periods

Not every delay is a failure

Schedules are aspirational. Server load, midnight maintenance windows, and leap-second handling all push jobs a few minutes past their scheduled time. A grace period absorbs this normal jitter so you're only paged when something is genuinely wrong.

Set it per check — a quick 5-minute health ping might use a 2-minute grace; a heavy nightly ETL that sometimes takes 20 minutes might use 30.

Grace period in action

02:00

Expected ping

02:06

Ping arrives (6 min late)

within 10 min grace — no alert

03:00

Expected ping

03:11

No ping (11 min late)

grace expired — alerting

grace: 600 # seconds — 10 minutes

Smart alerting · Noise reduction

Built to reduce noise, not add to it

Alert fatigue is real. SteadyCron has multiple mechanisms to keep your pager quiet unless something genuinely needs your attention.

Consecutive failure threshold

A single blip — a momentary network hiccup, a one-off OOM kill — should not page your team at 3 AM. Set a threshold of N consecutive failures before an alert fires. The first failure is logged; the third fires the alert.


alert_after: 3 # only page after 3 consecutive misses

Flapping guard

A job that rapidly alternates between healthy and failing generates a storm of alerts with no useful signal. The flapping guard detects this oscillation and suppresses repeat notifications until the job stabilises — one alert, not twenty.

 # fail → ok → fail → ok → fail
 ⚠ flapping detected — alert held

Quiet hours

Mark a time window per check where non-critical failures are suppressed — the alert is logged but not delivered. When your team is back online, everything that happened is in the delivery log with its suppression reason.


quiet_hours: "22:00–07:00 Europe/Berlin"

Alert lifecycle

🔴

nightly-backup missed

02:14 — alert sent to #ops-slack

📋

Alert suppressed (quiet hours)

03:00 — subsequent miss logged, not sent

✅

nightly-backup recovered

02:07 next day — auto-resolved

🔕

#ops-slack notified: resolved

No manual acknowledgement needed

Smart alerting · Auto-resolve

Alerts resolve themselves

When a job recovers — the next ping arrives successfully — SteadyCron automatically resolves the open alert and sends a resolution notification to the same channel. You never need to manually close an incident that has already fixed itself.

Every alert event is logged with its trigger, delivery channel, status, and suppression reason if applicable — so you always have a complete record of what fired, what was held, and why.

One platform, three jobs

HTTP Execution

Run your endpoints on schedule — retries, timeouts, and full run logs.

Heartbeat Monitoring

You're here

Watch jobs you run anywhere; get alerted the moment one goes silent.

Cron as Code

Define every job, monitor, and alert in YAML or Terraform.

Your jobs should tell you they ran.

Add a heartbeat check in two minutes. No agent, no SDK, no infrastructure change — just a URL at the end of your script.

Start free Read the docs

Free tier — no credit card
EU-hosted, GDPR-native
Any language, any platform