Cron monitoring for solo devs, in under 5 minutes

heartbeatscronguide

You have cron jobs.

Maybe a nightly backup. Maybe a daily script that pulls from an API and dumps it into a database. Maybe a weekly digest email that goes to ten people. Maybe a thing that runs every five minutes to keep a free-tier dyno warm.

You wrote them, they work, you forgot about them. That’s the whole problem.

Cron jobs fail silently. The script throws halfway through, the cron daemon catches the exit code, mails it to a MAILTO=root address that nobody reads, and life goes on. Three weeks later you discover your database has not been backed up since March. By then you can’t even reconstruct what failed, because the logs rotated out.

This post is how to set up a dead-man’s-switch on every one of those jobs in about five minutes per job, with Upwatch’s heartbeat monitoring. The free plan covers 10 of them.

The idea, in one paragraph

Instead of Upwatch checking on your job, your job checks in with Upwatch. At the end of every successful run, the job sends a single HTTP request to a unique URL. Upwatch remembers when that URL was last pinged. If too much time goes by without one, it pages you. That’s the whole concept.

Setting up your first heartbeat

In the dashboard, go to Heartbeats → New Heartbeat. You’ll fill in three things:

  • Name — something you’ll recognize later. “Nightly DB backup”, “Stripe sync”, “Newsletter digest”.
  • Expected interval — how often the job is supposed to run. Anywhere from 60 seconds to 7 days.
  • Grace period — how late the job is allowed to be before we call it dead. Default is 5 minutes. Crank it up for jobs that take a long time and finish at variable times.

Save it, and you’ll see your ping URL:

https://upwatch.dev/ping/abcd1234efgh5678

That URL is the entire integration. Hit it when the job finishes. Don’t hit it when the job fails.

Wiring it into common setups

Bash / crontab. Append the curl to the end of the command. The && is the important part — only ping if the script exits clean.

0 3 * * * /usr/local/bin/backup.sh && curl -fsS -m 10 https://upwatch.dev/ping/YOUR_TOKEN

Python script. Wrap the call in a try/finally if you want to ping regardless, or only on success — your call.

import urllib.request

run_backup()
urllib.request.urlopen("https://upwatch.dev/ping/YOUR_TOKEN", timeout=10)

GitHub Actions. Drop a step at the end of the workflow. Use if: success() so it only fires when everything worked.

- name: Ping Upwatch
  if: success()
  run: curl -fsS -m 10 --retry 3 https://upwatch.dev/ping/YOUR_TOKEN

Docker container with a Healthcheck. Use the ping URL inside HEALTHCHECK — if the container is healthy, the daemon hits the URL on every check.

Anything else. It’s an HTTP GET (or POST, your pick). Any language, any platform, any runtime can do it. There is no SDK to install.

The /fail variant: signal explicitly when the job dies

Sometimes you want the script to announce that it failed instead of just going quiet. Use the /fail endpoint:

backup.sh || curl -fsS -m 10 https://upwatch.dev/ping/YOUR_TOKEN/fail

This pages you immediately on a failure, rather than waiting for the interval-plus-grace timeout. Useful for jobs that should never fail and where every minute of silence is a real outage.

Choosing the interval and grace period

This is where most setups get tripped up. Two rules of thumb:

  1. Interval is the schedule, not the runtime. If the job runs every hour on the hour, the interval is 1 hour. It does not matter how long the job itself takes.
  2. Grace is your safety margin. Set it to longer than the worst-case duration of the job, plus some buffer. A job that usually takes 2 minutes but occasionally takes 20 minutes during peak load needs a 30-minute grace. Otherwise you’ll get false alerts every time the job is slow.

For very fast, very frequent jobs (every minute), keep the grace tight — 2–3 minutes. For long, infrequent jobs (nightly), be generous — 30 minutes to an hour. You can always tighten later.

What alerts look like

When a heartbeat goes silent past interval + grace, Upwatch opens an incident and sends to whichever channels you have set up — email, Slack, Discord, Telegram, SMS, or generic webhook. The alert tells you which heartbeat is overdue and when it was last seen. The moment your job pings successfully again, the incident auto-resolves and you get a “back up” notification. No manual ack required.

Things I wish I’d known sooner

A few non-obvious things that have bitten me:

  • Don’t ping at the start of the job, ping at the end. Pinging at the start tells you the cron daemon fired. That’s not what you care about. What you care about is whether the job finished.
  • For idempotent jobs, you can ping on success or already-done. A backup script that detects an existing backup for today and exits early should still ping. The job is “done” in the sense that matters.
  • A job that takes longer than its interval is a bug, not an alert. If your “hourly” job takes 70 minutes, you’re going to get spurious alerts forever. Fix the job, don’t widen the grace.
  • Watch for clock skew. The server running your cron job and Upwatch both need accurate clocks. NTP is on by default on most modern Linux distros, but if you’re in a Docker container without it, jobs can ping “early” or “late” by minutes.

Why this is better than checking your logs

Logs tell you what happened. They don’t tell you what didn’t happen. A cron job that silently stopped running three weeks ago leaves no entry in the logs — by definition. The only way to catch silent failure is to require positive evidence of liveness on a schedule, which is exactly what a heartbeat is.

That’s the whole pitch. Ten heartbeats are free forever. If you’re a solo dev with a half-dozen scheduled scripts, this is probably the highest-leverage monitoring you can put in place. Start with one — your oldest cron job — and add the others as you remember them.