tree 4e6b709107a1913997fc9fabfa6cbe83b8712e90
parent a97efbc9cb46b0d5c98db211633a12c9f7f5fd71
author Barret Rhoden <brho@cs.berkeley.edu> 1582910415 -0500
committer Barret Rhoden <brho@cs.berkeley.edu> 1585780074 -0400

Add a simple watchdog

This watchdog will reboot the machine if core 0 is sufficiently wedged.
Typically, this happens when Core 0 is stuck in IRQ context, such that
you cannot interact with the machine remotely.  Then you are unable to
do anything, and unless you have decent remote reset capabilities (which
I don't), you can't get the machine back.

The watchdog uses the HPET timer and an NMI, ensuring responsiveness
when IRQs are disabled.

A reboot-worthy delay is defined as the watchdog ktask not running for a
long period: at least the time you requested.  This means if you are
stuck in another kthread (to include a non-blocking syscall) for long
enough, such as in a non-irqsave spinlock, or just a long computation,
you could trigger the watchdog.  Recall that our kthread scheduler is
non-preemptive.

To set a watchdog that will wait at least 120 seconds before rebooting:

	echo on 120 > \#watchdog/ctl

To turn it off, if you want:

	echo off > \#watchdog/ctl

To see the status (on or off):

	cat  \#watchdog/ctl

If you try to set it for longer than the timer / driver can handle,
we'll adjust it down to an appropriate time and send you a warning.

If you're working on a remote machine, I recommend putting something in
your init.sh.

Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
