Skip to content
Get started

Exit code 137: killed by SIGKILL (usually out of memory)

Exit code 137 means the process was terminated by SIGKILL (signal 9), because 137 = 128 + 9 under the shell convention that a process killed by signal N reports 128 + N. The most common cause is the Linux out-of-memory (OOM) killer, or a container memory limit (cgroup) killing the process when it exceeds its allowed memory. A plain retry will not help because it is a resource ceiling, not a transient error — you have to reduce the job's memory use or raise the limit.

Why does exit code 137 happen?

Exit code 137 is a signal-termination code. Under the shell convention used by Bash and most job runners, a process killed by signal N exits with status 128 + N. 137 - 128 = 9, and signal 9 is SIGKILL — a signal that cannot be caught, blocked, or ignored, so the process dies immediately with no chance to clean up.

Something sent that SIGKILL. In a scheduled job the usual senders are:

  • The Linux OOM killer: when the machine runs out of memory, the kernel picks a process and kills it with SIGKILL to reclaim memory. You'll see an "Out of memory: Killed process" line in dmesg or the kernel log.
  • A container memory limit: in Docker or Kubernetes, exceeding the container's memory limit (its cgroup limit) triggers an OOM kill of the offending process, surfacing as exit 137 on the container.
  • A manual or orchestrated kill -9: something (a deploy script, a watchdog, an operator) sent SIGKILL directly.

How do I confirm it was the OOM killer?

Check the kernel log for the OOM killer's own message. It names the killed PID and the memory it was using:

dmesg -T | grep -i -E "killed process|out of memory"
journalctl -k --since "1 hour ago" | grep -i "out of memory"
# In Kubernetes, the pod's last state shows the reason:
kubectl describe pod <pod> | grep -A3 "Last State"   # look for Reason: OOMKilled

If you see "Out of memory: Killed process" the kernel OOM killer did it. If a Kubernetes pod shows Reason: OOMKilled, the container hit its memory limit. If neither appears, something sent SIGKILL explicitly.

How do I fix exit code 137?

  • Reduce peak memory: stream or batch instead of loading everything into memory at once, process records in chunks, and free large buffers as soon as you're done with them.
  • Raise the ceiling: increase the container/cgroup memory limit or move the job to a larger machine if the working set is genuinely that large.
  • Cap concurrency: if the job forks workers, each worker holds its own memory — fewer parallel workers means a lower peak.
  • Don't just retry: because 137 is a resource ceiling, an automatic retry usually gets killed the same way. Fix the memory pressure first.

How do I get alerted when a job exits 137?

An OOM-killed job often dies mid-run, so it never reports success — which a heartbeat monitor reads as a miss and alerts on. Have the job ping a monitor only on a clean exit, so a SIGKILL withholds the ping:

# Ping only on a zero exit (&&, not ;). A SIGKILL-ed job sends no ping,
# and the monitor alerts you that the run missed.
/usr/local/bin/nightly.sh && curl -fsS -m 10 --retry 3 "https://ping.cronshield.com/<your-check-id>"
On the free tier the alert tells you the run missed. On paid tiers, CronShield reads the failing run's logs and puts the exit-137 / OOM evidence and a likely cause in the alert itself. PING_URL is a placeholder for the endpoint you get on a monitor; the CronShield receiver ships in an upcoming release.

Catch this failure automatically

The free tier gives you a heartbeat endpoint and an email alert when an expected ping doesn't arrive. Paid tiers add the log-aware diagnosis — the last log line and a likely cause in the alert. The heartbeat receiver ships in an upcoming release; see the plans to learn what each tier adds.

Frequently asked questions

Is exit code 137 always out of memory?
No, but it usually is. 137 specifically means SIGKILL (signal 9). The OOM killer and container memory limits are the most common senders of SIGKILL to a batch job, but a manual kill -9 or a watchdog can also produce it. Check the kernel log for an "Out of memory" line to confirm.
What's the difference between exit 137 and exit 143?
Both are signal terminations. 137 = 128 + 9 = SIGKILL (an immediate, uncatchable kill, often OOM). 143 = 128 + 15 = SIGTERM (a polite termination the process can handle for a graceful shutdown). A 137 job was killed hard; a 143 job was asked to stop.