7. Troubleshooting

7.1 Terminating all active experiments

To terminate all active experiments a clean-up script is available in the IMUNES system. The script is invoked by issuing the command cleanupAll with root privileges. This script will terminate all running experiments.

# cleanupAll

7.1.1 Cleaning up hanging ZFS mounts

If a machine running an experiment is rebooted the experiment will not be available after boot but the ZFS mounts needed for network nodes will remain available after the restart. The cleanupAll tool is also used for destroying the remaining ZFS mounts.

7.2 Restoring original ZFS snapshot

NOTE: New versions of IMUNES use Unionfs instead of ZFS.

The ZFS snapshot used for replicating on virtual nodes can be corrupted or deleted. To restore the ZFS snapshot to the original state we first need to destroy the existing default snapshot named vroot/vroot@clean. Before destroying the ZFS snapshot make sure that no experiments are currently running by using the command himage -l and use the imunes -b eid option or the cleanupAll tool to terminate them. removal of the ZFS snapshot is done by issuing the following command with root user privileges:

# zfs destroy -fR vroot/vroot@clean

After the procedure is complete download the IMUNES tarball from the IMUNES website (http://imunes.tel.fer.hr/dl/imunes-1.0.tar.gz) and unpack it. Then enter the directory an run the command make vroot with root privileges to restore the ZFS snapshot. The procedure done as follows:

# tar xf imunes-1.0.tar.gz
# cd imunes_version
# make vroot_zfs

NOTE: To restore the snapshot an Internet connection must be available.

7.3 Obtaining kernel panic traces

In case a specific experiment configuration, workload type or hardware configuration triggers operating system crashes, obtaining traces of such events may be instrumental for successful debugging and resolving the observed operating system level bugs. The following procedure is recommended for collecting kernel panic traces:

Add the following line to the /etc/rc.conf file:

dumpdev="AUTO"

Create a new file /usr/local/etc/rc.d/textdump and populate it with the following script:

#!/bin/sh
#
# PROVIDE: textdump
# REQUIRE: LOGIN
#

ddb script kdb.enter.default="textdump set; capture on; bt;\
show reg; show pcpu; show vnets; call doadump; reset"

Set execution bit on /usr/local/etc/rc.d/textdump file:

# chmod +x /usr/local/etc/rc.d/textdump

Once rebooted, the machine should be from now on configured to automatically store kernel panic traces in /var/crash directory. Here's an example of collection of kernel panic trace files:

# ls -l /var/crash/
total 96
-rw-r--r--  1 root  wheel      2 Sep 20 00:06 bounds
-rw-------  1 root  wheel    440 Sep 19 23:57 info.0
-rw-------  1 root  wheel    442 Sep 20 00:00 info.1
-rw-------  1 root  wheel    442 Sep 20 00:06 info.2
-rw-------  1 root  wheel  24576 Sep 19 23:57 textdump.tar.0
-rw-------  1 root  wheel  39424 Sep 20 00:00 textdump.tar.1
-rw-------  1 root  wheel  24576 Sep 20 00:06 textdump.tar.2
#

Such "textdump" tarballs should be sumbitted along with any reports of kernel crashes.